不即不离

文章 评论 浏览 29

不即不离 2025-02-21 00:49:38

”中

SET
Table_A.col1 = Table_B.col1,
Table_A.col2 = Table_B.col2

OUTPUT deleted.*, inserted.*

“ https://stackoverflow.com/a/2334741/1501497 “输出” :通过这种方式,我看到了即将发生的一切。当我对所看到的东西感到满意时,我将回滚将其更改为 commit

我通常需要记录我所做的工作,因此在运行滚动式查询时,我使用“ to text” 选项,并且同时保存脚本和输出的结果。 (当然,如果我更换了太多行,这是不切实际的)

In the accepted answer, after the:

SET
Table_A.col1 = Table_B.col1,
Table_A.col2 = Table_B.col2

I would add:

OUTPUT deleted.*, inserted.*

What I usually do is putting everything in a roll backed transaction and using the "OUTPUT": in this way I see everything that is about to happen. When I am happy with what I see, I change the ROLLBACK into COMMIT.

I usually need to document what I did, so I use the "results to Text" option when I run the roll-backed query and I save both the script and the result of the OUTPUT. (Of course this is not practical if I changed too many rows)

如何从SQL Server中的选择中更新?

不即不离 2025-02-20 20:58:28

您可以使用将函数作为参数的装饰函数,并以 @称为 @

def DetermineAge(func):
    def wrapper(*args, **kwargs):
        age = func(*args, **kwargs)
        if age < 2:
            print("Stage of Life: A Baby")
        elif age < 4:
            print("Stage of Life: A Toddler")
        elif age < 13:
            print("Stage of Life: A Kid")
        elif age < 20:
            print("Stage of Life: A Teenager")
        elif age < 65:
            print("Stage of Life: An Adult")
        elif age >= 65:
            print("Stage of Life: An Elder")
        else:
            print("Mistakes were made, please restart the program and try again.")
        return age
    return wrapper


@DetermineAge
def GetAge():
    age = int(input("Please enter your age: "))
    return age


age = GetAge()

You can use a decorator function that takes a function as an argument and calls it with @, like this

def DetermineAge(func):
    def wrapper(*args, **kwargs):
        age = func(*args, **kwargs)
        if age < 2:
            print("Stage of Life: A Baby")
        elif age < 4:
            print("Stage of Life: A Toddler")
        elif age < 13:
            print("Stage of Life: A Kid")
        elif age < 20:
            print("Stage of Life: A Teenager")
        elif age < 65:
            print("Stage of Life: An Adult")
        elif age >= 65:
            print("Stage of Life: An Elder")
        else:
            print("Mistakes were made, please restart the program and try again.")
        return age
    return wrapper


@DetermineAge
def GetAge():
    age = int(input("Please enter your age: "))
    return age


age = GetAge()

如何将一个函数调用另一个函数(作为参数)?

不即不离 2025-02-20 20:33:19

这是两个想法如何工作的想法:)

import discord
from discord.ext import commands

class Support(commands.Cog):

  def __init__(self, client):
    self.client = client
    
  @commands.Cog.listener()
  async def on_message(self, message):
    if self.client.user.mentioned_in(message):
      checkMessage = message.content.split("@")
      if checkMessage[0] == "@":
        mention = discord.Embed(
          title = "The prefix is `,help`",
          colour = 0xeeffee
        )
      await message.channel.send(embed = mention)

def setup(client):
  client.add_cog(Support(client))
import discord
from discord.ext import commands

class Support(commands.Cog):

  def __init__(self, client):
    self.client = client
    
  @commands.Cog.listener()
  async def on_message(self, message):
    if self.client.user.mentioned_in(message):
      if message.content.startswith("@"):
        mention = discord.Embed(
          title = "The prefix is `,help`",
          colour = 0xeeffee
        )
      await message.channel.send(embed = mention)

def setup(client):
  client.add_cog(Support(client))

Here are two ideas how it could work :)

import discord
from discord.ext import commands

class Support(commands.Cog):

  def __init__(self, client):
    self.client = client
    
  @commands.Cog.listener()
  async def on_message(self, message):
    if self.client.user.mentioned_in(message):
      checkMessage = message.content.split("@")
      if checkMessage[0] == "@":
        mention = discord.Embed(
          title = "The prefix is `,help`",
          colour = 0xeeffee
        )
      await message.channel.send(embed = mention)

def setup(client):
  client.add_cog(Support(client))
import discord
from discord.ext import commands

class Support(commands.Cog):

  def __init__(self, client):
    self.client = client
    
  @commands.Cog.listener()
  async def on_message(self, message):
    if self.client.user.mentioned_in(message):
      if message.content.startswith("@"):
        mention = discord.Embed(
          title = "The prefix is `,help`",
          colour = 0xeeffee
        )
      await message.channel.send(embed = mention)

def setup(client):
  client.add_cog(Support(client))

提及前缀Discord.py重写的bot

不即不离 2025-02-20 19:19:41

此查询错误地编写了。您可以首先尝试在每个表上运行“右”查询,而不是从外部使用外部“选择”以获得最终结果,您需要:代码示例:

select state, sum(total) from (
select state, count(id) as total from tablea group by state
union
select state, count(id) as total from tableb group by state) as t3
group by state

This query is wrongly written. you can try to run the "right" query on each table first, than use external "select" from outside with sum to get the final result you need: code sample:

select state, sum(total) from (
select state, count(id) as total from tablea group by state
union
select state, count(id) as total from tableb group by state) as t3
group by state

如何通过子句有效地与两个不同的表与两个不同的表进行两个查询的联合

不即不离 2025-02-20 17:58:18

是的,您可以通过VPC端点策略实现这一目标。

这是文档)的示例。该政策使特定的IAM角色可以从Amazon ECR中汲取图像:

{
    "Statement": [{
        "Sid": "AllowPull",
        "Principal": {
            "AWS": "arn:aws:iam::1234567890:role/role_name"
        },
        "Action": [
            "ecr:BatchGetImage",
            "ecr:GetDownloadUrlForLayer",
            "ecr:GetAuthorizationToken"
        ],
        "Effect": "Allow",
        "Resource": "*"
    }]
}

Yes, you can achieve this with a VPC endpoint policy.

Here's an example from the documentation. This policy enables a specific IAM role to pull images from Amazon ECR:

{
    "Statement": [{
        "Sid": "AllowPull",
        "Principal": {
            "AWS": "arn:aws:iam::1234567890:role/role_name"
        },
        "Action": [
            "ecr:BatchGetImage",
            "ecr:GetDownloadUrlForLayer",
            "ecr:GetAuthorizationToken"
        ],
        "Effect": "Allow",
        "Resource": "*"
    }]
}

允许从ECR ECR.DKR VPC端点拉出,但不推动?

不即不离 2025-02-19 21:17:01

如果您具有N×M矩阵,则将其解释为对N维数据的观察,即输入中的每一列都是输入空间中的向量。由于 UMAP 降低了数据的维度,因此输入中的行必须大于请求数量的输出组件, n_components ,默认为2 and can and can and can' t(此刻)在UMAP.JL中为1

您对二维数据有86个观察结果。由于数据已经是二维,因此您不能使用 umap 来降低维度。

If you have an n×m matrix, UMAP.jl interprets it as m observations of n-dimensional data, i.e. every column in your input is a vector in your input space. Since umap reduces the dimensionality of your data the number of rows in your input needs to be larger than than the requested number of output components, n_components which defaults to 2 and can't (at this moment) be 1 in UMAP.jl.

You have 86 observations of 2-dimensional data. Since the data is already 2-dimensional you can't use umap to reduce the dimensionality.

大小(x,1)必须大于n_components,n_components必须大于1

不即不离 2025-02-19 12:34:33

常规概述:

data = pd.DataFrame(data=data_)
n = 2

def function_data(df, n):
    data = df.copy()
    idx1= [0,2,4]
    idx2=[1,3,5] 
    ids = [idx1, idx2]
    for id in ids:
        print(data, '\n') # I print before each iteration just to show it's working.
        data.loc[id, 'Number_a'] = data.Number_a.mul(n)
        data.loc[id, 'Number_b'] = data.Number_b.add(data.Number_c)
    return data


data = function_data(data, n)
print(data, '\n')

输出:

   Number_a  Number_b  Number_c
0        12        11        10
1        13        11         5
2        14        11         4
3        15        12         3
4        16        12         2
5        17        12         1

   Number_a  Number_b  Number_c
0        24        21        10
1        13        11         5
2        28        15         4
3        15        12         3
4        32        14         2
5        17        12         1

   Number_a  Number_b  Number_c
0        24        21        10
1        26        16         5
2        28        15         4
3        30        15         3
4        32        14         2
5        34        13         1

General Overview:

data = pd.DataFrame(data=data_)
n = 2

def function_data(df, n):
    data = df.copy()
    idx1= [0,2,4]
    idx2=[1,3,5] 
    ids = [idx1, idx2]
    for id in ids:
        print(data, '\n') # I print before each iteration just to show it's working.
        data.loc[id, 'Number_a'] = data.Number_a.mul(n)
        data.loc[id, 'Number_b'] = data.Number_b.add(data.Number_c)
    return data


data = function_data(data, n)
print(data, '\n')

Output:

   Number_a  Number_b  Number_c
0        12        11        10
1        13        11         5
2        14        11         4
3        15        12         3
4        16        12         2
5        17        12         1

   Number_a  Number_b  Number_c
0        24        21        10
1        13        11         5
2        28        15         4
3        15        12         3
4        32        14         2
5        17        12         1

   Number_a  Number_b  Number_c
0        24        21        10
1        26        16         5
2        28        15         4
3        30        15         3
4        32        14         2
5        34        13         1

循环范围内索引列表

不即不离 2025-02-19 05:26:36

要完成所需的操作,您需要将视图分配给颜色-0或1,以便映射正确的颜色。这可以使用地图完成。传说的手柄需要添加自定义文本,以便分配蓝色和红色,并使用正确的标签显示。我已经使用随机数作为数据来绘制所需的图,并将您的代码尽可能多地保留。

代码

import pandas as pd
import numpy as np
import random
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D

size = []
price = []
view = []

for i in range(0,100):
    size.append(round(random.random(),3))
    price.append(round(random.random(),3))
    view.append(int(random.random()*10 % 2))
df = pd.DataFrame({'size':size, 'price':price, 'view':view})
colors = {0:'red', 1:'blue'}
plt.scatter(x=df['size'], y=df['price'], c=df['view'].map(colors))
plt.xlabel("Size", fontsize = 25, c = "green")
plt.ylabel("Price", fontsize = 25, c = "green")
markersize=8) for k, v in colors.items()]
custom = [Line2D([], [], marker='.', color='red', linestyle='None'),
          Line2D([], [], marker='.', color='blue', linestyle='None')]

plt.legend(handles = custom, labels=['No View', 'View'], bbox_to_anchor= (1.05, 0.5), loc= "lower left")
plt.show()

输出图

”在此处输入图像描述”

To do what you need, you will need to assign the view - 0 or 1 to a color, so that the right color is mapped. This can be done using map. The handle for the legend will need to have the custom text added, so that the blue and red colors are assigned and show with the correct labels. I have used random numbers as data to plot the graph required, keeping as much of your code as is.

Code

import pandas as pd
import numpy as np
import random
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D

size = []
price = []
view = []

for i in range(0,100):
    size.append(round(random.random(),3))
    price.append(round(random.random(),3))
    view.append(int(random.random()*10 % 2))
df = pd.DataFrame({'size':size, 'price':price, 'view':view})
colors = {0:'red', 1:'blue'}
plt.scatter(x=df['size'], y=df['price'], c=df['view'].map(colors))
plt.xlabel("Size", fontsize = 25, c = "green")
plt.ylabel("Price", fontsize = 25, c = "green")
markersize=8) for k, v in colors.items()]
custom = [Line2D([], [], marker='.', color='red', linestyle='None'),
          Line2D([], [], marker='.', color='blue', linestyle='None')]

plt.legend(handles = custom, labels=['No View', 'View'], bbox_to_anchor= (1.05, 0.5), loc= "lower left")
plt.show()

Output graph

enter image description here

如何在matplotlib上的散点图中添加一个传说(这些点是根据0s和1s的数组编码的)?

不即不离 2025-02-18 19:26:31

这里的想法基于以下链接:链接1

在这两种情况下,最大的可能的矩形都是在给定的多边形/形状内计算的。检查以上两个链接以获取详细信息。

我们可以将上述想法扩展到手头的问题。

步骤:

  1. 按颜色过滤图像(例如红色)
  2. 找到红色区域中最大的矩形。这样做后。
  3. 重复以找到下一个最大的矩形,直到覆盖红色的所有部分。
  4. 重复上述每种独特颜色。

概述:

”在此处输入图像描述

The idea here is based on the following links: Link 1 and Link 2.

In both the cases, the largest possible rectangle is computed within a given polygon/shape. Check both the above links for details.

We can extend the idea above to the problem at hand.

Steps:

  1. Filter the image by color (say red)
  2. Find the largest possible rectangle in the red region. After doing so mask it.
  3. Repeat to find the next biggest rectangle until all the portions in red have been covered.
  4. Repeat the above for every unique color.

Overview:

enter image description here

通过将类似像素分组为矩形将图像划分

不即不离 2025-02-18 17:15:22

仅使用请求和BS4很难但可能。不完全确定您要解析的信息,但这应该对您有所帮助:

import requests, lxml, re, json
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

# works with different countries, languages
params = {
    "q": "mcdonalds",
    "gl": "jp",
    "hl": "ja", # japanese
}

response = requests.get("https://www.google.com/search", headers=headers, params=params)
soup = BeautifulSoup(response.text, 'lxml')

local_results = []

for result in soup.select('.VkpGBb'):
  title = result.select_one('.dbg0pd span').text
  try:
      website = result.select_one('.yYlJEf.L48Cpd')['href']
  except:
      website = None

  try:
      directions = f"https://www.google.com{result.select_one('.yYlJEf.VByer')['data-url']}"
  except:
      directions = None
      
  address_not_fixed = result.select_one('.lqhpac div').text
  # removes phone number from "address_not_fixed" variable
  # https://regex101.com/r/cwLdY8/1
  address = re.sub(r' · ?.*', '', address_not_fixed)
  phone = ''.join(re.findall(r' · ?(.*)', address_not_fixed))
  
  try:
      hours = result.select_one('.dXnVAb').previous_element
  except:
      hours = None

  try:
      options = result.select_one('.dXnVAb').text.split('·')
  except:
      options = None

  local_results.append({
      'title': title,
      'phone': phone,
      'address': address,
      'hours': hours,
      'options': options,
      'website': website,
      'directions': directions,
  })

print(json.dumps(local_results, indent=2, ensure_ascii=False))

这是您可以回来的输出,希望这会有所帮助!

# English results:
   {
    "title": "McDonald's",
    "phone": "(620) 251-3330",
    "address": "Coffeyville, KS",
    "hours": " ⋅ Opens 5AM",
    "options": [
      "Curbside pickup",
      "Delivery"
    ],
    "website": "https://www.mcdonalds.com/us/en-us/location/KS/COFFEYVILLE/302-W-11TH/4581.html?cid=RF:YXT:GMB::Clicks",
    "directions": "https://www.google.com/maps/dir//McDonald's,+302+W+11th+St,+Coffeyville,+KS+67337/data=!4m6!4m5!1m1!4e2!1m2!1m1!1s0x87b784f6803e4c81:0xf5af9c9c89f19918?sa=X&hl=en&gl=us"
  }

Using just requests and bs4 is hard but possible. Not entirely sure what information you are trying to parse, but this should help you:

import requests, lxml, re, json
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

# works with different countries, languages
params = {
    "q": "mcdonalds",
    "gl": "jp",
    "hl": "ja", # japanese
}

response = requests.get("https://www.google.com/search", headers=headers, params=params)
soup = BeautifulSoup(response.text, 'lxml')

local_results = []

for result in soup.select('.VkpGBb'):
  title = result.select_one('.dbg0pd span').text
  try:
      website = result.select_one('.yYlJEf.L48Cpd')['href']
  except:
      website = None

  try:
      directions = f"https://www.google.com{result.select_one('.yYlJEf.VByer')['data-url']}"
  except:
      directions = None
      
  address_not_fixed = result.select_one('.lqhpac div').text
  # removes phone number from "address_not_fixed" variable
  # https://regex101.com/r/cwLdY8/1
  address = re.sub(r' · ?.*', '', address_not_fixed)
  phone = ''.join(re.findall(r' · ?(.*)', address_not_fixed))
  
  try:
      hours = result.select_one('.dXnVAb').previous_element
  except:
      hours = None

  try:
      options = result.select_one('.dXnVAb').text.split('·')
  except:
      options = None

  local_results.append({
      'title': title,
      'phone': phone,
      'address': address,
      'hours': hours,
      'options': options,
      'website': website,
      'directions': directions,
  })

print(json.dumps(local_results, indent=2, ensure_ascii=False))

Here is the output that you will get back, hopefully this helps!:

# English results:
   {
    "title": "McDonald's",
    "phone": "(620) 251-3330",
    "address": "Coffeyville, KS",
    "hours": " ⋅ Opens 5AM",
    "options": [
      "Curbside pickup",
      "Delivery"
    ],
    "website": "https://www.mcdonalds.com/us/en-us/location/KS/COFFEYVILLE/302-W-11TH/4581.html?cid=RF:YXT:GMB::Clicks",
    "directions": "https://www.google.com/maps/dir//McDonald's,+302+W+11th+St,+Coffeyville,+KS+67337/data=!4m6!4m5!1m1!4e2!1m2!1m1!1s0x87b784f6803e4c81:0xf5af9c9c89f19918?sa=X&hl=en&gl=us"
  }

如何在不使用硒和任何API的情况下在Python中刮擦Google Map?

不即不离 2025-02-18 07:56:14

看来您正在尝试在给定范围内打印质数。可以说,混合质数的发现并打印它们是引起问题的原因。通过适当的分解,将不存在此问题:

def generate_primes(low, up):
    for num in range(max(low, 2), up+1):
       if all(num % i for i in range(2, num)):
           yield num

print(*generate_primes(low, up), sep=',')

作为积极的副作用,您现在可以在程序的其他部分重复使用Prime Generator,而该发电机不需要打印。

另请注意,检查所有数字最多 num 是不需要的 - 如果该数字为复合,则其中一个因素将较小或等于SQRT(NUM)。因此,更快的Prime发电机将是类似的:

def generate_primes(low, up):
    for num in range(max(low, 2), up+1):
       if all(num % i for i in range(2, int(num**0.5 + 1))):
           yield num

It looks like you're trying to print prime numbers in a given range. Arguably, mixing discovery of prime numbers and printing them is what causes the problem. With proper decomposition, this problem won't exist:

def generate_primes(low, up):
    for num in range(max(low, 2), up+1):
       if all(num % i for i in range(2, num)):
           yield num

print(*generate_primes(low, up), sep=',')

As a positive side effect, you can now reuse prime generator in other parts of the program which don't require printing.

Also note that checking all numbers up to num is not necessary - if the number is composite one of the factors will be less or equal to sqrt(num). So, a faster prime generator would be something like:

def generate_primes(low, up):
    for num in range(max(low, 2), up+1):
       if all(num % i for i in range(2, int(num**0.5 + 1))):
           yield num

如何删除尾声,遵循结尾=&#x27;,&#x27;

不即不离 2025-02-18 00:17:18

是字符串的时间吗?原因然后您可以使用 substr 像这样提取小时和分钟:

time <- c("2022-05-23 23:02:58", "2022-05-23 13:52:58", "2022-05-23 03:31:58", "2022-05-23 09:09:58")
n <- nchar(time)
hour <- substr(time, n - 7, n - 3)

Just Time 使用100.000行时间列

Is the time as a string ok? Cause then you can use substr to extract the hour and minutes like so:

time <- c("2022-05-23 23:02:58", "2022-05-23 13:52:58", "2022-05-23 03:31:58", "2022-05-23 09:09:58")
n <- nchar(time)
hour <- substr(time, n - 7, n - 3)

Just time with your 100.000 row time column

数据帧和数据

不即不离 2025-02-17 09:52:11

因此,事实证明,我的问题源于以前设置forge_client_id和forge_client_secret System Systems变量,对FORGE进行了以前的未成功试验。这意味着我创建的Visual Studio解决方案是读取这些值,而不是我在代码中输入的值。感谢Autodesk的Cyrille Fauvel帮助我解决了这一点。

Cyrille说,他将回馈其余的Forge团队,他们应该在控制台上向ID和秘密的细节报告,以防止将来发生这种情况。

So, it turns out my issue stemmed from having previously setup a FORGE_CLIENT_ID and FORGE_CLIENT_SECRET system variables on a previous, unsuccessful trial of Forge. This meant that the Visual Studio solution I created was reading those values instead of the ones I had input in code. Thanks to Cyrille Fauvel of Autodesk for helping me figure this out.

Cyrille says he is going to feed back to the rest of the Forge team that they ought to report back on the console the details of both ID and SECRET to prevent this happening in future.

指定的客户端_id无法访问API产品

不即不离 2025-02-16 22:24:22

模式注册表将仅阻止配置用于使用它的生产商。默认情况下,经纪人不会执行模式。为此,您需要付费。

很难说出您正在显示的图像(您使用的端点是什么),但是如果它是Confluent Kafka Rest代理,请参阅生产和消费avro

# Produce a message using Avro embedded data, including the schema which will
# be registered with schema registry and used to validate and serialize
# before storing the data in Kafka
curl -X POST -H "Content-Type: application/vnd.kafka.avro.v2+json" \
      -H "Accept: application/vnd.kafka.v2+json" \
      --data '{"value_schema": "{\"type\": \"record\", \"name\": \"User\", \"fields\": [{\"name\": \"name\", \"type\": \"string\"}]}", "records": [{"value": {"name": "testUser"}}]}' \
      "http://localhost:8082/topics/avrotest"

如果没有这些细节,您只是发送普通文本JSON,这将不做模式检查。

如果您可以直接访问KAFKA群集,那么编写AVRO生产者客户端会更容易,因为您每次要发送事件时都不需要嵌入键和/或值模式。

The schema registry will only block producers that are configured to use it. By default, the broker will not enforce a schema. For that, you need to pay for Confluent Server.

Hard to tell what image you are showing (what REST endpoint you are using), but if it is the Confluent Kafka REST Proxy, then refer to quick-start section on producing and consuming Avro.

# Produce a message using Avro embedded data, including the schema which will
# be registered with schema registry and used to validate and serialize
# before storing the data in Kafka
curl -X POST -H "Content-Type: application/vnd.kafka.avro.v2+json" \
      -H "Accept: application/vnd.kafka.v2+json" \
      --data '{"value_schema": "{\"type\": \"record\", \"name\": \"User\", \"fields\": [{\"name\": \"name\", \"type\": \"string\"}]}", "records": [{"value": {"name": "testUser"}}]}' \
      "http://localhost:8082/topics/avrotest"

Without these specifics, you're just sending plain-text JSON, which will do no schema checks.

If you have direct access to the Kafka cluster, then writing an Avro producer client would be easier since you don't need to embed the key and/or value schemas every time you want to send an event.

KAFKA架构注册表 - 屏蔽邮件在Kafka的架构注册表中接受的消息

不即不离 2025-02-16 18:08:25

您可以通过爆炸来平移每月/总列,如下所示:

val df = Seq(
  ("Micheal", "Scott", "[email protected]", 4000, 5000, 3400, 50660),
  ("Dwight", "Schrute", "[email protected]", 1200, 6900, 1000, 35000),
  ("Kevin", "Malone", "[email protected]", 9000, 6000, 18000, 32000)
).toDF("FName","SName", "Email", "Jan 2021", "Feb 2021", "Mar 2021", "Total 2021")

val moYrCols = Array("Jan 2021", "Feb 2021", "Mar 2021", "Total 2021")  // (**)
val otherCols = df.columns diff moYrCols
val structCols = moYrCols.map{ c =>
    val moYr = split(lit(c), "\\s+")
    struct(moYr(1).as("Year"), moYr(0).as("Month"), col(c).as("Value"))
  }

df.
  withColumn("flattened", explode(array(structCols: _*))).
  select(otherCols.map(col) :+ $"flattened.*": _*).
  show
/*
+-------+-------+------------------+----+-----+-----+
|  FName|  SName|             Email|Year|Month|Value|
+-------+-------+------------------+----+-----+-----+
|Micheal|  Scott| [email protected]|2021|  Jan| 4000|
|Micheal|  Scott| [email protected]|2021|  Feb| 5000|
|Micheal|  Scott| [email protected]|2021|  Mar| 3400|
|Micheal|  Scott| [email protected]|2021|Total|50660|
| Dwight|Schrute|[email protected]|2021|  Jan| 1200|
| Dwight|Schrute|[email protected]|2021|  Feb| 6900|
| Dwight|Schrute|[email protected]|2021|  Mar| 1000|
| Dwight|Schrute|[email protected]|2021|Total|35000|
|  Kevin| Malone| [email protected]|2021|  Jan| 9000|
|  Kevin| Malone| [email protected]|2021|  Feb| 6000|
|  Kevin| Malone| [email protected]|2021|  Mar|18000|
|  Kevin| Malone| [email protected]|2021|Total|32000|
+-------+-------+------------------+----+-----+-----+
*/

(**)使用模式匹配,以防万一列;例如:

val moYrCols = df.columns.filter(_.matches("[A-Za-z]+\\s+\\d{4}"))

You can flatten the monthly/total columns via explode as shown below:

val df = Seq(
  ("Micheal", "Scott", "[email protected]", 4000, 5000, 3400, 50660),
  ("Dwight", "Schrute", "[email protected]", 1200, 6900, 1000, 35000),
  ("Kevin", "Malone", "[email protected]", 9000, 6000, 18000, 32000)
).toDF("FName","SName", "Email", "Jan 2021", "Feb 2021", "Mar 2021", "Total 2021")

val moYrCols = Array("Jan 2021", "Feb 2021", "Mar 2021", "Total 2021")  // (**)
val otherCols = df.columns diff moYrCols
val structCols = moYrCols.map{ c =>
    val moYr = split(lit(c), "\\s+")
    struct(moYr(1).as("Year"), moYr(0).as("Month"), col(c).as("Value"))
  }

df.
  withColumn("flattened", explode(array(structCols: _*))).
  select(otherCols.map(col) :+ 
quot;flattened.*": _*).
  show
/*
+-------+-------+------------------+----+-----+-----+
|  FName|  SName|             Email|Year|Month|Value|
+-------+-------+------------------+----+-----+-----+
|Micheal|  Scott| [email protected]|2021|  Jan| 4000|
|Micheal|  Scott| [email protected]|2021|  Feb| 5000|
|Micheal|  Scott| [email protected]|2021|  Mar| 3400|
|Micheal|  Scott| [email protected]|2021|Total|50660|
| Dwight|Schrute|[email protected]|2021|  Jan| 1200|
| Dwight|Schrute|[email protected]|2021|  Feb| 6900|
| Dwight|Schrute|[email protected]|2021|  Mar| 1000|
| Dwight|Schrute|[email protected]|2021|Total|35000|
|  Kevin| Malone| [email protected]|2021|  Jan| 9000|
|  Kevin| Malone| [email protected]|2021|  Feb| 6000|
|  Kevin| Malone| [email protected]|2021|  Mar|18000|
|  Kevin| Malone| [email protected]|2021|Total|32000|
+-------+-------+------------------+----+-----+-----+
*/

(**) Use pattern matching in case there are many columns; for example:

val moYrCols = df.columns.filter(_.matches("[A-Za-z]+\\s+\\d{4}"))

Scala undivot表

更多

推荐作者

櫻之舞

文章 0 评论 0

弥枳

文章 0 评论 0

m2429

文章 0 评论 0

野却迷人

文章 0 评论 0

我怀念的。

文章 0 评论 0

更多

友情链接

    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文