计算交易的最高进先出

发布于 2025-01-18 19:37:23 字数 1913 浏览 3 评论 0原文

我正在尝试在交易中使用先进先出的会计方法。 “最高进先出”意味着当您出售时，您首先出售最昂贵的股票。

这是我的买入和卖出（示例借自R计算总收益或使用先进先出法的损失 - 这是一个类似但不同的问题）：

buy = data.frame(BuyTransactionID = c(1:10),
                 Ticker=c(rep('MSFT',4),rep('AMZN',3),rep('DOCU',3)),
                 Date=c(rep('01-01-2018',2),rep('01-14-2020',2),rep('01-01-2018',2),rep('01-14-2020',1),'01-01-2018','03-15-2020','04-06-2020'),
                 Price=c(100,102,102,107,2000,2010,2011,197,182,167),
                 Quantity=c(10,10,5,5,1,1,2,12,15,15))

sell = data.frame(SellTransactionID=c(1:7),
                  Ticker=c('MSFT','MSFT','AMZN','AMZN','DOCU','DOCU','DOCU'),
                  Date=c('01-07-2020','01-20-2020','01-01-2020','01-30-2020','01-15-2020','04-10-2020','04-20-2020'),
                  Price=c(97,110,2100,2050,210,205,225),
                  Quantity=c(15,12,1,3,10,5,3))

规则如下：

您首先出售最昂贵（最高价格）的股票。
您不能在购买股票之前出售股票
您不能多次出售相同的股票

示例问题：

第一次销售 (SellTransactionID = 1) 是 01-07 的 15 股 MSFT -2020年。因此，在此日期之前购买的任何商品都可以出售。根据日期，符合资格出售的股票是来自 BuyTransactionID 1 和 2 的股票。BuyTransactionID 2 是最高价格。因此，BuyTransactionID 2 的所有 10 股均已售出，其余 5 股来自 BuyTransactionID 1。

期望输出：

“售出日期”= 售出日期（不言自明） ;

'Ticker' = 已售出的股票代码；

“收益”= 销售总额；

“成本基础”= 已售股票的加权平均值。

示例解决方案：

这是 SellTransactionID 1 的解决方案。正确的解决方案会自动执行此操作并计算所有 SellTransactionID。

result <- data.frame(SellDate = '01-07-2020', Ticker = "MSFT", Proceeds = 1455, CostBasis = 101.33)

成本基础示例：

成本基础按加权平均值计算。对于前面的示例，成本基础的计算方式如下：(Quantity1 * Price1 + Quantity2 * Price2 + .....)/所有数量的总和

因此例如上面的示例：(10 * 102 + 5 * 100)/ 15

原文

I am trying to use the Highest In, First Out accounting method on trades.
Highest In, First Out means that when you sell, you sell your most expensive shares first.

Here are my buys and sells (example borrowed from R calculate aggregate gains or loss using FIFO method - this is a similar, but different problem):

buy = data.frame(BuyTransactionID = c(1:10),
                 Ticker=c(rep('MSFT',4),rep('AMZN',3),rep('DOCU',3)),
                 Date=c(rep('01-01-2018',2),rep('01-14-2020',2),rep('01-01-2018',2),rep('01-14-2020',1),'01-01-2018','03-15-2020','04-06-2020'),
                 Price=c(100,102,102,107,2000,2010,2011,197,182,167),
                 Quantity=c(10,10,5,5,1,1,2,12,15,15))

sell = data.frame(SellTransactionID=c(1:7),
                  Ticker=c('MSFT','MSFT','AMZN','AMZN','DOCU','DOCU','DOCU'),
                  Date=c('01-07-2020','01-20-2020','01-01-2020','01-30-2020','01-15-2020','04-10-2020','04-20-2020'),
                  Price=c(97,110,2100,2050,210,205,225),
                  Quantity=c(15,12,1,3,10,5,3))

Here are the rules:

You sell the most expensive (highest price) shares first.
You cannot sell shares before you purchased them
You cannot sell the same shares multiple times

Example problem:

The first sale (SellTransactionID = 1) is 15 shares of MSFT on 01-07-2020. So, any purchase made before that date can be sold. Based on date, the eligible shares to be sold are those from BuyTransactionID 1 and 2. BuyTransactionID 2 is the highest price. Therefore, all 10 shares of BuyTransactionID 2 are sold and the remaining 5 shares come from BuyTransactionID 1.

Desired output:

'Date Sold' = the date sold (self-explanatory);

'Ticker' = the ticker sold;

'Proceeds' = the total dollar amount sold;

'Cost basis' = a weighted average of the shares sold.

Example solution:

This is the solution for SellTransactionID 1. A properly solution automates this and calculates for all SellTransactionIDs.

result <- data.frame(SellDate = '01-07-2020', Ticker = "MSFT", Proceeds = 1455, CostBasis = 101.33)

Cost Basis Example:

Cost basis is calculated as a weighted average. For the preceding example, Cost Basis is calculated as such: (Quantity1 * Price1 + Quanity2 * Price2 + .....)/sum of all Quantity(s)

So for example above: (10 * 102 + 5 * 100)/15

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

吝吻 2025-01-25 19:37:23

@DPH 的答案非常好，但不幸的是不够准确。我会解释原因。

这是一个新数据集，其中所有购买都在销售之前：

buy = data.frame(BuyTransactionID = c(1:10),
                 Ticker=c(rep('MSFT',10)),
                 Date=c(rep('01-01-2020',10)),
                 Price=c(100,102,102,107,105,111,109,112,115,106),
                 Quantity=c(10,10,5,5,1,1,2,12,15,15))


sell = data.frame(SellTransactionID=c(1:4),
                  Ticker=c('MSFT','MSFT','MSFT', 'MSFT'),
                  Date=c('01-07-2020','01-20-2020','01-21-2020', 
                  '01-22-2020'),
                  Price=c(120,119,117, 121),
                  Quantity=c(7,12,1, 5))

如果应用 @DPH 的解决方案，您将得到以下结果：

没有改变，最近三笔交易的“Sales_Cost”也没有改变。发生这种情况是因为该函数确定首次出售后剩余的股票数量以及剩余股票的平均价格是多少。首次出售之前购买的股票不能再单独出售。它们现在被视为具有平均价格和剩余股份数量的单一实体。

例如，本例中总共购买了 76 股。首次出售出售 7 股。现在，如“Remain_Qtd”中所示，仍有 69 股保留。计算剩余股票的平均价格 - 该价格为 106.5652 美元。现在，该流程认为所有 69 股股票的定价为 106.5652 美元，剩余销售量会减少“Remain_Qtd”的数量，但不会更改“Remain_Price”。剩余股份不能再按购买时的价格考虑，它们共同构成剩余股份和平均剩余价格。

发生这种情况是因为对象 dfo 以及对象 sdf 中 dfo 的回收。特别是，该行计算平均剩余价格，然后通过 dfo 和 sdf 回收该价格。

Price = (sum(ip * iq) - v) / sum(sdf$Quantity)

Quantity = sum(sdf$Quantity) 将所有剩余份额加在一起。

我认为@DPH 的答案非常出色，但希望可以对其进行修改，以单独处理每次购买，而不是汇总过去的购买。

The answer by @DPH is excellent, but unfortunately not quite accurate enough. I will explain why.

Here is a new dataset where all the purchases precede the sales:

buy = data.frame(BuyTransactionID = c(1:10),
                 Ticker=c(rep('MSFT',10)),
                 Date=c(rep('01-01-2020',10)),
                 Price=c(100,102,102,107,105,111,109,112,115,106),
                 Quantity=c(10,10,5,5,1,1,2,12,15,15))


sell = data.frame(SellTransactionID=c(1:4),
                  Ticker=c('MSFT','MSFT','MSFT', 'MSFT'),
                  Date=c('01-07-2020','01-20-2020','01-21-2020', 
                  '01-22-2020'),
                  Price=c(120,119,117, 121),
                  Quantity=c(7,12,1, 5))

If you apply the solution from @DPH, you will get this result:

Notice that the 'Remain_Price' does not change, nor does the 'Sales_Cost' for the last three transactions. This happens because the function determines how many shares remain after the first sale and what the average price of the remaining shares is. The shares purchased preceding the first sale can no longer be sold individually. They are now treated as a single entity with an average price and the remaining number of shares.

For example, a total of 76 shares were bought in this example. The first sale sells 7 shares. Now, 69 shares remain as seen in 'Remain_Qtd'. An average price is calculated for those remaining shares - that price is $106.5652. Now, the process considers all 69 shares to be priced at $106.5652 and the remaining sales reduce the quantity of 'Remain_Qtd', but does not change the 'Remain_Price'. The remaining shares can no longer be considered at the price that they were bought at, they are collectively part of the remaining shares and the average remaining price.

This occurs because of the object dfo and the recycling of dfo in the object sdf. In particular, this line calculates an average remaining price that is then recycled through dfo and sdf.

Price = (sum(ip * iq) - v) / sum(sdf$Quantity)

and Quantity = sum(sdf$Quantity) adds together all the remaining shares.

I think the answer by @DPH is brilliant, but hope that it can be modified to treat each purchase individually rather than aggregating past purchases.

回复收藏 0 原文

独孤求败 2025-01-25 19:37:23

如果我正确理解您的问题，这是一种可能的解决方案。在简历中，我将销售额和购买数据组合在一起，并将其分组到销售块中（由销售ID给出）。这假设销售ID的顺序是根据日期列的。然后，我顺序循环浏览这些销售块，并将中间结果写入单个数据框架。对于每个销售块处理，将对同一股票的最后一个销售块结果过滤此结果数据框。这意味着销售量不得根据时间表大于可用数量（因为您无法出售自己没有的东西，无论如何，我必须将其指出为可能的限制）

提议的循环解决方案1不是在R中使用数据的最佳方法，因为它是一个循环，它会生长一个data.frame。由于您列出了purrr标签，因此我为答案的第二部分调整了与Map（）函数一起工作的代码。

在进入实际编码之前，让我们首先准备数据（以相同方式对答案的两个部分需要）：

library(tidyverse) # need dplyr, using lubridate for date parsing and tidyr to fill data.frame

# include identifier variable, manipulate column name and union sells and buy
df <- dplyr::mutate(buy, io = "i") %>% 
    dplyr::rename(TID  = BuyTransactionID) %>% 
    dplyr::union(dplyr::mutate(sell, io = "o") %>% 
                     dplyr::rename(TID = SellTransactionID)) %>% 
    # sort the data
    dplyr::arrange(Ticker, Date) %>% 
    # make Qtd negative for sales, remove buy IDs, parce Date and set up helper column
    dplyr::mutate(Quantity = ifelse(io == "o", -1 * Quantity, Quantity), 
                  TID = ifelse(io == "i", NA, TID),
                  Date = lubridate::mdy(Date),
                  hprice = NA_real_) %>% 
    # group data to fill backwards per group
    dplyr::group_by(Ticker) %>% 
    tidyr::fill(TID, .direction = "up") %>%
    # ungroup to prevent unwanted behaviour downstream
    dplyr::ungroup()

1标准循环

dfo <- df[0, ] # empty copy of df

for (i in sort(unique(df$TID))) {
    # which ticker are we working with at this sale
    t <- unique(df[df$TID == i, ]$Ticker)
    # bind data from last sale of this ticker to current sale
    sdf <- rbind(dplyr::slice_max(dfo[dfo$Ticker == t, ], "Date"), 
                 df[df$TID == i, ])
    # current sales quantiy as positive value
    o1 <- abs(sdf[sdf$io == "o", ]$Quantity)
    # copy to use for greedy algo
    o2 <- o1
    # vectors of price and qtd of bought shares at this sale, having price in decreasing order
    ip <- sdf[sdf$io == "i", ] %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Price)
    iq <- sdf[sdf$io == "i", ] %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Quantity)
    # total value of shares in greedy
    v <- 0
    # loop to run over bought prices and quantities do reduce from sold qtd per sales block
    # check if shares block is larger then remaining sales qtd to break loop
    for(l in 1:length(ip)){
        if(o2 < iq[l]){
            v <- v + ip[l] * o2
            break
        }else{
            o2 <- o2 - iq[l]
            v <- v + ip[l] * iq[l] 
        }
    } 
    # fill sales block frame and bind to output df
    dfo <- rbind(dfo,
                 data.frame(TID = i,
                            Ticker = t, 
                            Date = max(sdf$Date),
                            Price = (sum(ip * iq) - v) / sum(sdf$Quantity), 
                            Quantity = sum(sdf$Quantity), 
                            io = "i", 
                            hprice = v / o1))
}

# format, join original data and calculate result per Sales block
dplyr::select(dfo, Ticker, Date, Remain_Price = Price, Remain_Qtd = Quantity, Sales_Cost = hprice) %>%
    dplyr::left_join(dplyr::select(df, Ticker, Date, Sales_Price = Price, Sales_Qtd = Quantity), by = c("Ticker", "Date")) %>%
    dplyr::mutate(Result = abs(Sales_Qtd) * (Sales_Price - Sales_Cost))

  Ticker       Date Remain_Price Remain_Qtd Sales_Cost Sales_Price Sales_Qtd Result
1   MSFT 2020-01-07     100.0000          5   101.3333          97       -15    -65
2   MSFT 2020-01-20     100.0000          3   103.7500         110       -12     75
3   AMZN 2020-01-01    2000.0000          1  2010.0000        2100        -1     90
4   AMZN 2020-01-30          NaN          0  2007.3333        2050        -3    128
5   DOCU 2020-01-15     197.0000          2   197.0000         210       -10    130
6   DOCU 2020-04-10     173.6667         27   188.0000         205        -5     85
7   DOCU 2020-04-20       0.0000         -1   131.3333         225        -3    281

2循环循环恢复为purrr解决方案（请注意全局分配操作员（＆lt;＆lt; - 而不是）＆lt ;-)用于在功能结束时分配DFO）

# rephrase loop as function
myfun <- function(i){
    # which ticker are we working with at this sale
    t <- unique(df[df$TID == i, ]$Ticker)
    # bind data from last sale of this ticker to current sale
    sdf <- rbind(dplyr::slice_max(dfo[dfo$Ticker == t, ], "Date"), 
                 df[df$TID == i, ])
    # current sales quantiy as positive value
    o1 <- abs(sdf[sdf$io == "o", ]$Quantity)
    # copy to use for greedy algo
    o2 <- o1
    # vectors of price and qtd of bought shares at this sale, having price in decreasing order
    ip <- sdf[sdf$io == "i", ] %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Price)
    iq <- sdf[sdf$io == "i", ] %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Quantity)
    # total value of shares in greedy
    v <- 0
    # loop to run over bought prices and quantities do reduce from sold qtd per sales block
    # check if shares block is larger then remaining sales qtd to break loop
    for(l in 1:length(ip)){
        if(o2 < iq[l]){
            v <- v + ip[l] * o2
            break
        }else{
            o2 <- o2 - iq[l]
            v <- v + ip[l] * iq[l] 
        }
    } 
    # fill sales block frame and bind to output df
    dfo <<- rbind(dfo,
                 data.frame(TID = i,
                            Ticker = t, 
                            Date = max(sdf$Date),
                            Price = (sum(ip * iq) - v) / sum(sdf$Quantity), 
                            Quantity = sum(sdf$Quantity), 
                            io = "i", 
                            hprice = v / o1))
}


# empty copy of df
dfo <- df[0, ]

purrr::map(sort(unique(df$TID)), ~ myfun(.x))[[max(df$TID)]] %>% 
    dplyr::select(Ticker, Date, Remain_Price = Price, Remain_Qtd = Quantity, Sales_Cost = hprice) %>%
    dplyr::left_join(dplyr::select(df, Ticker, Date, Sales_Price = Price, Sales_Qtd = Quantity), by = c("Ticker", "Date")) %>%
    dplyr::mutate(Result = abs(Sales_Qtd) * (Sales_Price - Sales_Cost))

  Ticker       Date Remain_Price Remain_Qtd Sales_Cost Sales_Price Sales_Qtd Result
1   MSFT 2020-01-07     100.0000          5   101.3333          97       -15    -65
2   MSFT 2020-01-20     100.0000          3   103.7500         110       -12     75
3   AMZN 2020-01-01    2000.0000          1  2010.0000        2100        -1     90
4   AMZN 2020-01-30          NaN          0  2007.3333        2050        -3    128
5   DOCU 2020-01-15     197.0000          2   197.0000         210       -10    130
6   DOCU 2020-04-10     173.6667         27   188.0000         205        -5     85
7   DOCU 2020-04-20       0.0000         -1   131.3333         225        -3    281

编辑

要跟踪其余股票，我们需要第二个DF来保存当前的投资组合数据。我没有优化代码并仅编辑循环，但是purrr适应应非常直接。

library(tidyverse) # need dplyr, using lubridate for date parsing and tidyr to fill data.frame

# include identifier variable, manipulate column name and union sells and buy
df <- dplyr::mutate(buy, io = "i") %>% 
    dplyr::rename(TID  = BuyTransactionID) %>% 
    dplyr::union(dplyr::mutate(sell, io = "o") %>% 
                     dplyr::rename(TID = SellTransactionID)) %>% 
    # sort the data
    dplyr::arrange(Ticker, Date) %>% 
    # make Qtd negative for sales, remove buy IDs, parce Date and set up helper column
    dplyr::mutate(Quantity = ifelse(io == "o", -1 * Quantity, Quantity), 
                  TID = ifelse(io == "i", NA, TID),
                  Date = lubridate::mdy(Date),
                  hprice = NA_real_) %>% 
    # group data to fill backwards per group
    dplyr::group_by(Ticker) %>% 
    tidyr::fill(TID, .direction = "up") %>%
    # ungroup to prevent unwanted behaviour downstream
    dplyr::ungroup() 

dfo <- df[0, ] # empty copy of df
dfh <- df[0, c("Ticker", "Price", "Quantity", "io")] # to hold current stock aka portfolio

for (i in sort(unique(df$TID))) {
    # which ticker are we working with at this sale
    t <- unique(df[df$TID == i, ]$Ticker)
    # bind data from last sale of this ticker to current sale
    sdf <- rbind(dplyr::slice_max(dfo[dfo$Ticker == t, ], "Date"), 
                 df[df$TID == i, ])
    # bind data from current portfolio to buys between last and current sale (new port folio before sale)
    sdfh <- rbind(dfh[dfh$Ticker == t, ],
                  df[df$TID == i & df$io == "i", c("Ticker", "Price", "Quantity", "io")])
              
    # current sales quantiy as positive value
    o1 <- abs(sdf[sdf$io == "o", ]$Quantity)
    # copy to use for greedy algo
    o2 <- o1
    # vectors of price and qtd of bought shares at this sale, having price in decreasing order
    ip <- sdfh %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Price)
    iq <- sdfh %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Quantity)
    ips <- ip
    iqs <- iq
    # total value of shares in greedy
    v <- 0
    # loop to run over bought prices and quantities do reduce from sold qtd per sales block
    # check if shares block is larger then remaining sales qtd to break loop
    for(l in 1:length(ip)){
        if(o2 < iq[l]){
            v <- v + ip[l] * o2
            iqs[1] <- iqs[1] - o2
            break
        }else{
            o2 <- o2 - iq[l]
            v <- v + ip[l] * iq[l] 
            ips <- ips[-1]
            iqs <- iqs[-1]
        }
    } 
    dfh <- rbind(dfh[dfh$Ticker != t, ],
                 data.frame(Ticker = t, 
                            Price = ips, 
                            Quantity = iqs, 
                            io = "i"))
    # fill sales block frame and bind to output df
    dfo <- rbind(dfo,
                 data.frame(TID = i,
                            Ticker = t, 
                            Date = max(sdf$Date),
                            Price = sum(ips * iqs) / sum(iqs), 
                            Quantity = sum(iqs), 
                            io = "i",
                            hprice = v/o1))
}

dfo
  TID Ticker       Date    Price Quantity io hprice
1   1   MSFT 2020-01-07 106.5652       69  i    115
2   2   MSFT 2020-01-20 105.0000       57  i    114
3   3   MSFT 2020-01-21 104.8750       56  i    112
4   4   MSFT 2020-01-22 104.1765       51  i    112

If I understood your problem correctly this is one possible solution. In resume I am combinig the sales and buys data and group it in sales blocks (given by the sales ID). This assumes that the order of sales IDs is according to the date column. I then loop over these sales blocks sequentially and write the intermediate result to a individual dataframe. For each sales block processing this result dataframe is filtered for the last sales block result of the same ticker. This means sales quantity must not be larger than available quantity according to the timeline (since you can not sell what you not have this should not be of concern anyhow I have to point it out as a possible limitation)

The proposed loop solution 1 is not the best way to work data in R since it is a loop, which grows a data.frame. Since you listed the purrr tag I adapted the code for the second part of the answer to work with the map() function.

Before we get to the actual coding lets prepare the data first (need for both parts of the answer the same way):

library(tidyverse) # need dplyr, using lubridate for date parsing and tidyr to fill data.frame

# include identifier variable, manipulate column name and union sells and buy
df <- dplyr::mutate(buy, io = "i") %>% 
    dplyr::rename(TID  = BuyTransactionID) %>% 
    dplyr::union(dplyr::mutate(sell, io = "o") %>% 
                     dplyr::rename(TID = SellTransactionID)) %>% 
    # sort the data
    dplyr::arrange(Ticker, Date) %>% 
    # make Qtd negative for sales, remove buy IDs, parce Date and set up helper column
    dplyr::mutate(Quantity = ifelse(io == "o", -1 * Quantity, Quantity), 
                  TID = ifelse(io == "i", NA, TID),
                  Date = lubridate::mdy(Date),
                  hprice = NA_real_) %>% 
    # group data to fill backwards per group
    dplyr::group_by(Ticker) %>% 
    tidyr::fill(TID, .direction = "up") %>%
    # ungroup to prevent unwanted behaviour downstream
    dplyr::ungroup()

1 Standard loop

dfo <- df[0, ] # empty copy of df

for (i in sort(unique(df$TID))) {
    # which ticker are we working with at this sale
    t <- unique(df[df$TID == i, ]$Ticker)
    # bind data from last sale of this ticker to current sale
    sdf <- rbind(dplyr::slice_max(dfo[dfo$Ticker == t, ], "Date"), 
                 df[df$TID == i, ])
    # current sales quantiy as positive value
    o1 <- abs(sdf[sdf$io == "o", ]$Quantity)
    # copy to use for greedy algo
    o2 <- o1
    # vectors of price and qtd of bought shares at this sale, having price in decreasing order
    ip <- sdf[sdf$io == "i", ] %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Price)
    iq <- sdf[sdf$io == "i", ] %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Quantity)
    # total value of shares in greedy
    v <- 0
    # loop to run over bought prices and quantities do reduce from sold qtd per sales block
    # check if shares block is larger then remaining sales qtd to break loop
    for(l in 1:length(ip)){
        if(o2 < iq[l]){
            v <- v + ip[l] * o2
            break
        }else{
            o2 <- o2 - iq[l]
            v <- v + ip[l] * iq[l] 
        }
    } 
    # fill sales block frame and bind to output df
    dfo <- rbind(dfo,
                 data.frame(TID = i,
                            Ticker = t, 
                            Date = max(sdf$Date),
                            Price = (sum(ip * iq) - v) / sum(sdf$Quantity), 
                            Quantity = sum(sdf$Quantity), 
                            io = "i", 
                            hprice = v / o1))
}

# format, join original data and calculate result per Sales block
dplyr::select(dfo, Ticker, Date, Remain_Price = Price, Remain_Qtd = Quantity, Sales_Cost = hprice) %>%
    dplyr::left_join(dplyr::select(df, Ticker, Date, Sales_Price = Price, Sales_Qtd = Quantity), by = c("Ticker", "Date")) %>%
    dplyr::mutate(Result = abs(Sales_Qtd) * (Sales_Price - Sales_Cost))

  Ticker       Date Remain_Price Remain_Qtd Sales_Cost Sales_Price Sales_Qtd Result
1   MSFT 2020-01-07     100.0000          5   101.3333          97       -15    -65
2   MSFT 2020-01-20     100.0000          3   103.7500         110       -12     75
3   AMZN 2020-01-01    2000.0000          1  2010.0000        2100        -1     90
4   AMZN 2020-01-30          NaN          0  2007.3333        2050        -3    128
5   DOCU 2020-01-15     197.0000          2   197.0000         210       -10    130
6   DOCU 2020-04-10     173.6667         27   188.0000         205        -5     85
7   DOCU 2020-04-20       0.0000         -1   131.3333         225        -3    281

2 loop rephrase as purrr solution (be aware of the global assignment operartor (<<- instead of <-) for assignment of dfo at end of function)

# rephrase loop as function
myfun <- function(i){
    # which ticker are we working with at this sale
    t <- unique(df[df$TID == i, ]$Ticker)
    # bind data from last sale of this ticker to current sale
    sdf <- rbind(dplyr::slice_max(dfo[dfo$Ticker == t, ], "Date"), 
                 df[df$TID == i, ])
    # current sales quantiy as positive value
    o1 <- abs(sdf[sdf$io == "o", ]$Quantity)
    # copy to use for greedy algo
    o2 <- o1
    # vectors of price and qtd of bought shares at this sale, having price in decreasing order
    ip <- sdf[sdf$io == "i", ] %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Price)
    iq <- sdf[sdf$io == "i", ] %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Quantity)
    # total value of shares in greedy
    v <- 0
    # loop to run over bought prices and quantities do reduce from sold qtd per sales block
    # check if shares block is larger then remaining sales qtd to break loop
    for(l in 1:length(ip)){
        if(o2 < iq[l]){
            v <- v + ip[l] * o2
            break
        }else{
            o2 <- o2 - iq[l]
            v <- v + ip[l] * iq[l] 
        }
    } 
    # fill sales block frame and bind to output df
    dfo <<- rbind(dfo,
                 data.frame(TID = i,
                            Ticker = t, 
                            Date = max(sdf$Date),
                            Price = (sum(ip * iq) - v) / sum(sdf$Quantity), 
                            Quantity = sum(sdf$Quantity), 
                            io = "i", 
                            hprice = v / o1))
}


# empty copy of df
dfo <- df[0, ]

purrr::map(sort(unique(df$TID)), ~ myfun(.x))[[max(df$TID)]] %>% 
    dplyr::select(Ticker, Date, Remain_Price = Price, Remain_Qtd = Quantity, Sales_Cost = hprice) %>%
    dplyr::left_join(dplyr::select(df, Ticker, Date, Sales_Price = Price, Sales_Qtd = Quantity), by = c("Ticker", "Date")) %>%
    dplyr::mutate(Result = abs(Sales_Qtd) * (Sales_Price - Sales_Cost))

  Ticker       Date Remain_Price Remain_Qtd Sales_Cost Sales_Price Sales_Qtd Result
1   MSFT 2020-01-07     100.0000          5   101.3333          97       -15    -65
2   MSFT 2020-01-20     100.0000          3   103.7500         110       -12     75
3   AMZN 2020-01-01    2000.0000          1  2010.0000        2100        -1     90
4   AMZN 2020-01-30          NaN          0  2007.3333        2050        -3    128
5   DOCU 2020-01-15     197.0000          2   197.0000         210       -10    130
6   DOCU 2020-04-10     173.6667         27   188.0000         205        -5     85
7   DOCU 2020-04-20       0.0000         -1   131.3333         225        -3    281

EDIT

To keep track of the remaining stocks we need a second df to hold the current portfolio data. I did not optimize the code and editted only the loop, the purrr adaption should be pretty straight foreward though.

library(tidyverse) # need dplyr, using lubridate for date parsing and tidyr to fill data.frame

# include identifier variable, manipulate column name and union sells and buy
df <- dplyr::mutate(buy, io = "i") %>% 
    dplyr::rename(TID  = BuyTransactionID) %>% 
    dplyr::union(dplyr::mutate(sell, io = "o") %>% 
                     dplyr::rename(TID = SellTransactionID)) %>% 
    # sort the data
    dplyr::arrange(Ticker, Date) %>% 
    # make Qtd negative for sales, remove buy IDs, parce Date and set up helper column
    dplyr::mutate(Quantity = ifelse(io == "o", -1 * Quantity, Quantity), 
                  TID = ifelse(io == "i", NA, TID),
                  Date = lubridate::mdy(Date),
                  hprice = NA_real_) %>% 
    # group data to fill backwards per group
    dplyr::group_by(Ticker) %>% 
    tidyr::fill(TID, .direction = "up") %>%
    # ungroup to prevent unwanted behaviour downstream
    dplyr::ungroup() 

dfo <- df[0, ] # empty copy of df
dfh <- df[0, c("Ticker", "Price", "Quantity", "io")] # to hold current stock aka portfolio

for (i in sort(unique(df$TID))) {
    # which ticker are we working with at this sale
    t <- unique(df[df$TID == i, ]$Ticker)
    # bind data from last sale of this ticker to current sale
    sdf <- rbind(dplyr::slice_max(dfo[dfo$Ticker == t, ], "Date"), 
                 df[df$TID == i, ])
    # bind data from current portfolio to buys between last and current sale (new port folio before sale)
    sdfh <- rbind(dfh[dfh$Ticker == t, ],
                  df[df$TID == i & df$io == "i", c("Ticker", "Price", "Quantity", "io")])
              
    # current sales quantiy as positive value
    o1 <- abs(sdf[sdf$io == "o", ]$Quantity)
    # copy to use for greedy algo
    o2 <- o1
    # vectors of price and qtd of bought shares at this sale, having price in decreasing order
    ip <- sdfh %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Price)
    iq <- sdfh %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Quantity)
    ips <- ip
    iqs <- iq
    # total value of shares in greedy
    v <- 0
    # loop to run over bought prices and quantities do reduce from sold qtd per sales block
    # check if shares block is larger then remaining sales qtd to break loop
    for(l in 1:length(ip)){
        if(o2 < iq[l]){
            v <- v + ip[l] * o2
            iqs[1] <- iqs[1] - o2
            break
        }else{
            o2 <- o2 - iq[l]
            v <- v + ip[l] * iq[l] 
            ips <- ips[-1]
            iqs <- iqs[-1]
        }
    } 
    dfh <- rbind(dfh[dfh$Ticker != t, ],
                 data.frame(Ticker = t, 
                            Price = ips, 
                            Quantity = iqs, 
                            io = "i"))
    # fill sales block frame and bind to output df
    dfo <- rbind(dfo,
                 data.frame(TID = i,
                            Ticker = t, 
                            Date = max(sdf$Date),
                            Price = sum(ips * iqs) / sum(iqs), 
                            Quantity = sum(iqs), 
                            io = "i",
                            hprice = v/o1))
}

dfo
  TID Ticker       Date    Price Quantity io hprice
1   1   MSFT 2020-01-07 106.5652       69  i    115
2   2   MSFT 2020-01-20 105.0000       57  i    114
3   3   MSFT 2020-01-21 104.8750       56  i    112
4   4   MSFT 2020-01-22 104.1765       51  i    112

回复收藏 0 原文

九局 2025-01-25 19:37:23

这是我在@DPH 的帮助下得出的最终工作解决方案。我对@DPH 编辑的解决方案做了一些更改。

当股票的所有股票由于多种原因（包括 dfh 对象）而被出售时，编辑的解决方案不起作用。更新解决方案确实适用于我提供的修改后的日期集，但不适用于原始日期集。我修改了答案，以便在出售所有股票时它可以工作。
我已修改结果以包含购买日期。这对于确定出售是长期还是短期资本收益非常重要。
我已从已购买但未出售的股票中删除了股票代码，因为这些股票会破坏
我应用的脚本 purrr 到更新的解决方案以避免循环。
我已将基本子集（即 df[]）更改为 dplyr 子集化（即 df %>% filter()）。出于某种原因，基本子集化导致我的实际数据集中出现具有 NA 值的行，即使它并没有导致示例数据集。 NA 行导致解决方案无法正常工作：

数据框准备：

df <- buy %>% filter(Ticker %in% unique(sell$Ticker)) %>% dplyr::mutate(io = "i") %>% 
  dplyr::rename(TID  = BuyTransactionID) %>% 
  dplyr::union(dplyr::mutate(sell, io = "o") %>% 
                 dplyr::rename(TID = SellTransactionID)) %>%
  # sort the data
  dplyr::arrange(Ticker, Date) %>% 
  # make Qtd negative for sales, remove buy IDs, parce Date and set up helper column
  dplyr::mutate(Quantity = ifelse(io == "o", -1 * Quantity, Quantity), 
                TID = ifelse(io == "i", NA, TID),
                Date = lubridate::mdy(Date),
                hprice = NA_real_) %>% 
  dplyr::arrange(Ticker, Date) %>% 
  # group data to fill backwards per group
  dplyr::group_by(Ticker) %>% 
  tidyr::fill(TID, .direction = "up") %>%
  # ungroup to prevent unwanted behaviour downstream
  dplyr::ungroup()

df$Dates_bought <- NA

函数和 purrr：

# rephrase loop as function
myfun <- function(i){
  # which ticker are we working with at this sale
  t <- unique(df[df$TID == i, ]$Ticker)
  # bind data from last sale of this ticker to current sale
  sdf <- rbind(dplyr::slice_max(dfo[dfo$Ticker == t, ], "Date"), 
               df %>% filter(TID == i))
  
  sdfh <- rbind(dfh %>% filter(Ticker == t),
                df %>% filter(TID == i & io == "i") %>% select(c("Ticker", "Date", "Price", "Quantity",  "io")))
  # current sales quantiy as positive value
  o1 <- abs(sdf %>% filter(io == "o") %>% pull(Quantity))
  # copy to use for greedy algo
  o2 <- o1
  # vectors of price and qtd of bought shares at this sale, having price in decreasing order
  ip <- sdfh %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Price)
  iq <- sdfh %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Quantity)
  date <- sdfh %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Date)
  ips <- ip
  iqs <- iq
  dates <- date
  # total value of shares in greedy
  v <- 0
  # loop to run over bought prices and quantities do reduce from sold qtd per sales block
  # check if shares block is larger then remaining sales qtd to break loop. Modifications 
  # to make dates work properly. 
  for(l in 1:length(ip)){
    if(o2 < iq[l]){
      v <- v + ip[l] * o2
      iqs[1] <- iqs[1] - o2
      dates2 <- if(o2 == 0) dates else dates[-1]
      break
    }else{
      o2 <- o2 - iq[l]
      v <- v + ip[l] * iq[l] 
      ips <- ips[-1]
      iqs <- iqs[-1]
      dates <- dates[-1]
      dates2 <- dates
    }
  }
  # Needs to have the if else statements because when all shares are sold, the length
  # of dates, ips, and iqs is 0, whereas Ticker and io are length 1. 
  dfh <<- rbind(dfh[dfh$Ticker != t, ],
                data.frame(Ticker = if(length(ips) == 0) numeric(length = 0L) else t,
                           Date = dates,
                           Price = ips, 
                           Quantity = iqs, 
                           io = if(length(ips) == 0) numeric(length = 0L) else "i"))
  
  # fill sales block frame and bind to output df
  dfo <<- rbind(dfo,
                data.frame(TID = i,
                           Ticker = t, 
                           Date = max(sdf$Date),
                           Dates_bought = paste(date[seq(length(date)-length(dates2))], collapse = ","),
                           Price = sum(ips * iqs) / sum(iqs), 
                           Quantity = sum(iqs), 
                           io = "i", 
                           hprice = v / o1))
}


# empty copy of df
dfo <- df[0, ] # empty copy of df
dfh <- df[0, c("Ticker", "Date", "Price", "Quantity", "io")] # to hold current stock aka portfolio

hifo <- purrr::map(sort(unique(df$TID)), ~ myfun(.x))[[length(unique(df$TID))]] %>% 
  dplyr::select(Ticker, Date, Dates_bought, Remain_Price = Price, Remain_Qtd = Quantity, Sales_Cost = hprice) %>%
  dplyr::left_join(dplyr::select(df, Ticker, Date, Sales_Price = Price, Sales_Qtd = Quantity), by = c("Ticker", "Date")) %>%
  dplyr::mutate(Result = abs(Sales_Qtd) * (Sales_Price - Sales_Cost))

如果有人有任何问题，请告诉我，我想将其放入闪亮的应用程序中，如果您有兴趣合作，请告诉我。

Here is the final working solution that I have come to with the help of @DPH. I have made a couple of changes to @DPH's edited solution.

The edited solution does not work when all the shares of a stock are sold for multiple reasons including the dfh object. The updates solution does work with the modified dateset that I provided but not the original dateset. I have modified the answer so that it works when all shares are sold.
I have modified the result to include the dates of the purchases. This will be important for determining whether the sale is long term or short term capital gains.
I have removed tickers from stocks that have been purchased but not sold, as those will break the script
I have applied purrr to the updated solution to avoid looping.
I have changed the base subsetting (i.e., df[]) to dplyr subsetting (i.e., df %>% filter(). For some reason the base subsetting was resulting in rows with NA values in my actual dataset even though it did not cause that in the sample dataset. The NA rows caused the solution not to work.

data frame prep:

df <- buy %>% filter(Ticker %in% unique(sell$Ticker)) %>% dplyr::mutate(io = "i") %>% 
  dplyr::rename(TID  = BuyTransactionID) %>% 
  dplyr::union(dplyr::mutate(sell, io = "o") %>% 
                 dplyr::rename(TID = SellTransactionID)) %>%
  # sort the data
  dplyr::arrange(Ticker, Date) %>% 
  # make Qtd negative for sales, remove buy IDs, parce Date and set up helper column
  dplyr::mutate(Quantity = ifelse(io == "o", -1 * Quantity, Quantity), 
                TID = ifelse(io == "i", NA, TID),
                Date = lubridate::mdy(Date),
                hprice = NA_real_) %>% 
  dplyr::arrange(Ticker, Date) %>% 
  # group data to fill backwards per group
  dplyr::group_by(Ticker) %>% 
  tidyr::fill(TID, .direction = "up") %>%
  # ungroup to prevent unwanted behaviour downstream
  dplyr::ungroup()

df$Dates_bought <- NA

function and purrr:

# rephrase loop as function
myfun <- function(i){
  # which ticker are we working with at this sale
  t <- unique(df[df$TID == i, ]$Ticker)
  # bind data from last sale of this ticker to current sale
  sdf <- rbind(dplyr::slice_max(dfo[dfo$Ticker == t, ], "Date"), 
               df %>% filter(TID == i))
  
  sdfh <- rbind(dfh %>% filter(Ticker == t),
                df %>% filter(TID == i & io == "i") %>% select(c("Ticker", "Date", "Price", "Quantity",  "io")))
  # current sales quantiy as positive value
  o1 <- abs(sdf %>% filter(io == "o") %>% pull(Quantity))
  # copy to use for greedy algo
  o2 <- o1
  # vectors of price and qtd of bought shares at this sale, having price in decreasing order
  ip <- sdfh %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Price)
  iq <- sdfh %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Quantity)
  date <- sdfh %>% dplyr::arrange(desc(Price)) %>% dplyr::pull(Date)
  ips <- ip
  iqs <- iq
  dates <- date
  # total value of shares in greedy
  v <- 0
  # loop to run over bought prices and quantities do reduce from sold qtd per sales block
  # check if shares block is larger then remaining sales qtd to break loop. Modifications 
  # to make dates work properly. 
  for(l in 1:length(ip)){
    if(o2 < iq[l]){
      v <- v + ip[l] * o2
      iqs[1] <- iqs[1] - o2
      dates2 <- if(o2 == 0) dates else dates[-1]
      break
    }else{
      o2 <- o2 - iq[l]
      v <- v + ip[l] * iq[l] 
      ips <- ips[-1]
      iqs <- iqs[-1]
      dates <- dates[-1]
      dates2 <- dates
    }
  }
  # Needs to have the if else statements because when all shares are sold, the length
  # of dates, ips, and iqs is 0, whereas Ticker and io are length 1. 
  dfh <<- rbind(dfh[dfh$Ticker != t, ],
                data.frame(Ticker = if(length(ips) == 0) numeric(length = 0L) else t,
                           Date = dates,
                           Price = ips, 
                           Quantity = iqs, 
                           io = if(length(ips) == 0) numeric(length = 0L) else "i"))
  
  # fill sales block frame and bind to output df
  dfo <<- rbind(dfo,
                data.frame(TID = i,
                           Ticker = t, 
                           Date = max(sdf$Date),
                           Dates_bought = paste(date[seq(length(date)-length(dates2))], collapse = ","),
                           Price = sum(ips * iqs) / sum(iqs), 
                           Quantity = sum(iqs), 
                           io = "i", 
                           hprice = v / o1))
}


# empty copy of df
dfo <- df[0, ] # empty copy of df
dfh <- df[0, c("Ticker", "Date", "Price", "Quantity", "io")] # to hold current stock aka portfolio

hifo <- purrr::map(sort(unique(df$TID)), ~ myfun(.x))[[length(unique(df$TID))]] %>% 
  dplyr::select(Ticker, Date, Dates_bought, Remain_Price = Price, Remain_Qtd = Quantity, Sales_Cost = hprice) %>%
  dplyr::left_join(dplyr::select(df, Ticker, Date, Sales_Price = Price, Sales_Qtd = Quantity), by = c("Ticker", "Date")) %>%
  dplyr::mutate(Result = abs(Sales_Qtd) * (Sales_Price - Sales_Cost))

Let me know if anyone has any issues. I'd like to get this into a shinyapp and maybe develop it more. Let me know if you're interested in collaborating.

回复收藏 0 原文

~没有更多了~