按列中的值从数据帧中提取矩阵

发布于 2025-01-09 07:17:29 字数 2024 浏览 0 评论 0原文

我正在尝试一些可能有点难以理解的事情,但我会尝试非常具体。

Lat这样的 python 数据框

我有一个像LocalityCount长的。
克拉斯诺达尔 俄罗斯4439
地拉那阿尔巴尼亚41.3319.83
阿雷尼亚美尼亚39.7345.2
卡尔斯亚美尼亚40.60451743.100758
Brunn Wolfholz奥地利48.12039616.291722
Kleinhadersdorf Flur Marchleiten奥地利48.66319716.589687
Jalilabad区阿塞拜疆39.360713948.4613556
Zeyem Chaj阿塞拜疆40.941888945.8327778
Jalilabad区阿塞拜疆39.518611148.65

和一个数据框cities.txt 带有一些国家的名称:

Albania 
Armenia
Austria
Azerbaijan

等等。 我接下来要做的就是转换这个纬度。和长。将值作为弧度,然后使用列表中的值执行以下操作:

with open('cities.txt') as file:
  lines=file.readlines()
  x=np.where(df['Count'].eq(lines),pd.DataFrame(
  dist.pairwise(df[['Lat.','Long.']].to_numpy())*6373,
    columns=df.Locality.unique(), index=df.Locality.unique()))

Where pd.DataFrame(dist.pairwise(df[['Lat.','Long.']].to_numpy())*6373, columns =df.Locality.unique(), index=df.Locality.unique()) 正在转换 Lat 中的弧度。长。 以公里为单位的距离,并为每条线(国家/地区)创建一个数据框作为矩阵。

最后我将有很多按国家分组的二维矩阵(理论上),我想应用这个:

>>>Russia.min()
0
>>>Russia.max()
5

获取 .min().max() 每个矩阵中的值并将结果保存在 cities.txt 中,

Country Max.Dist. Min. Dist.
Albania  5    1
Armenia  10   9
Austria  5    3
Azerbaijan 0  0

不幸的是,1)我在第一部分中收到警告ValueError:长度必须相等, 2) 可以将此矩阵分组按国家/地区和 3) 保存我的 .min().max() 值?

I am trying something that could be a little hard to understand but i will try to be very specific.

I have a dataframe of python like this

LocalityCountLat.Long.
KrasnodarRussia4439
TiranaAlbania41.3319.83
AreniArmenia39.7345.2
KarsArmenia40.60451743.100758
Brunn WolfholzAustria48.12039616.291722
Kleinhadersdorf Flur MarchleitenAustria48.66319716.589687
Jalilabad districtAzerbaijan39.360713948.4613556
Zeyem ChajAzerbaijan40.941888945.8327778
Jalilabad districtAzerbaijan39.518611148.65

And a dataframe cities.txt with a the name of some countries:

Albania 
Armenia
Austria
Azerbaijan

And so on.
The nex what I am doing is convert this Lat. and Long. values as radians and then with the values from the list do something like:

with open('cities.txt') as file:
  lines=file.readlines()
  x=np.where(df['Count'].eq(lines),pd.DataFrame(
  dist.pairwise(df[['Lat.','Long.']].to_numpy())*6373,
    columns=df.Locality.unique(), index=df.Locality.unique()))

Where pd.DataFrame(dist.pairwise(df[['Lat.','Long.']].to_numpy())*6373, columns=df.Locality.unique(), index=df.Locality.unique()) is converting radians in Lat. Long. into distances in km and create a dataframe as a matrix for each line (country).

In the end i will have a lot of matrix 2d (in theory) grouped by countries and i want to apply this:

>>>Russia.min()
0
>>>Russia.max()
5

to get the .min() and .max() value in each matrix and save this results in cities.txt as

Country Max.Dist. Min. Dist.
Albania  5    1
Armenia  10   9
Austria  5    3
Azerbaijan 0  0

Unfortunately, 1) I'm stock in the first part where I have an warning ValueError: Lengths must be equal, 2) can be possible have this matrix grouped by country and 3) save my .min() and .max() values?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

遇见了你 2025-01-16 07:17:29

我不确定你到底想要什么。在此解决方案中,如果只有 1 个城市,则最小值为 0,否则为国内 2 个城市之间的最短距离。另外,文件名 cities.txt 似乎只是一个过滤器。我没有这样做,但看起来很简单。

import numpy as np
import pandas as pd

这里只是一些示例数据;

cities = pd.read_json("https://raw.githubusercontent.com/lutangar/cities.json/master/cities.json")
cities = cities.sample(10000)

groupby() 创建并应用自定义聚合

from sklearn.metrics import DistanceMetric
dist = DistanceMetric.get_metric('haversine')

country_groups = cities.groupby("country")

def city_distances(group):
    geo = group[['lat','lng']]

    EARTH_RADIUS = 6371

    haversine_distances = dist.pairwise(np.radians(geo) )
    haversine_distances *= EARTH_RADIUS
    
    distances = {}
    distances['max'] = np.max(haversine_distances)
    
    distances['min'] = 0
    if len(haversine_distances[ np.nonzero(haversine_distances)] ) > 0 :
        distances['min'] = np.min( haversine_distances[ np.nonzero(haversine_distances)] )
        
    return pd.Series(distances)

country_groups.apply(city_distances)

在我的例子中,这会打印类似的内容

                 max         min
country                         
AE        323.288482  323.288482
AF       1130.966661   15.435642
AI         12.056890   12.056890
AL        272.300688    3.437074
AM        268.051071    1.328605
...              ...         ...
YE        662.412344   19.103222
YT          3.723376    3.723376
ZA       1466.334609   24.319334
ZM       1227.429001  218.566369
ZW        503.562608   26.316902

[194 rows x 2 columns]

I am not sure what you exactly want as minimum. In this solution, the minimum is 0 if there is only 1 city, but otherwise the shortest distance between 2 cities within the country. Also, the filename cities.txt seems just a filter. I didn't do this but seems straightforward.

import numpy as np
import pandas as pd

Here just some sample data;

cities = pd.read_json("https://raw.githubusercontent.com/lutangar/cities.json/master/cities.json")
cities = cities.sample(10000)

Create and apply a custom aggregate for groupby()

from sklearn.metrics import DistanceMetric
dist = DistanceMetric.get_metric('haversine')

country_groups = cities.groupby("country")

def city_distances(group):
    geo = group[['lat','lng']]

    EARTH_RADIUS = 6371

    haversine_distances = dist.pairwise(np.radians(geo) )
    haversine_distances *= EARTH_RADIUS
    
    distances = {}
    distances['max'] = np.max(haversine_distances)
    
    distances['min'] = 0
    if len(haversine_distances[ np.nonzero(haversine_distances)] ) > 0 :
        distances['min'] = np.min( haversine_distances[ np.nonzero(haversine_distances)] )
        
    return pd.Series(distances)

country_groups.apply(city_distances)

In my case this prints something like

                 max         min
country                         
AE        323.288482  323.288482
AF       1130.966661   15.435642
AI         12.056890   12.056890
AL        272.300688    3.437074
AM        268.051071    1.328605
...              ...         ...
YE        662.412344   19.103222
YT          3.723376    3.723376
ZA       1466.334609   24.319334
ZM       1227.429001  218.566369
ZW        503.562608   26.316902

[194 rows x 2 columns]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文