问题的描述
我的目标是基本的:情节时间序列。经过一些研究,我决定尝试 altair 。
已经有 qgis插件,没有用于绘制向量级别的时间序列,可以交互地单击地图并选择多边形。因此,这就是为什么我决定使用Altair寻求自制解决方案,也许将其与folium结合起来以稍后添加功能。
我完全是Altair库(以及Vega和Vega-Lite)的新手,并且在DataScience和Data可视化方面也很新……因此,我对我的无知表示歉意!
关于如何与Altair绘制时间序列的教程已经有充分的解释(例如此处或在官方网站中)。但是,据我所知,我的研究案例有一些特殊性,尚未完全接近。
数据是使用用于Google Earth Engine的Python API生成的,并用Python和Pandas/Geopandas库进行了预处理:
- Google Earth Engine 中的
,计算了植被指数(当前案例中的NDVI)在某个感兴趣区域(ROI)的像素级。然后函数 image.reduceregions() Imagecollection 要计算A fartialcollection element 元素的每个多边形中NDVI的平均值,该元素代表农业包裹。结果向量文件被导出。
-
在 jupyter-lab 环境下,将数据加载到geopandas geodataframe 对象对象并进行了预处理,转换数据框并创建a datetime 列等,以便使用Altair的时间序列表示数据良好。
预处理后的数据概述:
我的“最终”目标是在同一图形中显示一个交互式线图,其中一组线代表每条线一个农业包裹,并由包裹按作物类型进行分类在不同的颜色中,例如绿色,小麦,黄色的小麦,棕色的同伴树...(可以将包含每个包裹的裁剪类型的信息添加到 dataframe 与另一个 DataFrame )。
我正在考虑一些看起来或多或少类似于以下示例的东西,而传说的几年是裁剪类型的包裹:
,
但到目前为止,我还没有设法使我的数据看起来像...全部。
如您所见,数据中有许多null(这是由于云掩蔽函数的应用,并且是由于ROI相交的几个Sentinel-2 Orbits的事实)。我只想省略earch列/包裹的非零值,但是我不知道此数据配置是否会构成问题(对此有任何建议吗?)。
到目前为止,我得到了:
- 单个包裹的前面图形的产生已经大约需要23秒。哪些东西也许可以改善shoud/云(如何?)
- ,更重要的是,代表项目/polygon/parcel的值(ndvi)的预期线甚至都没有显示(请注意我选择了包含相当少的非零值的包裹)。
可以肯定的是,我做错了很多事情。很高兴获得一些建议来解决其中的一些建议。
示例数据和代码以重现问题
在此'sa 并且用于重现问题的代码如下:
import pandas as pd
import geopandas as gpd
import altair as alt
df= pd.read_json(r"path\to\json\file.json")
df['date']= pd.to_datetime(df['date'])
print(gdf.dtypes)
df
输出:
lines=alt.Chart(df).mark_line().encode(
x='date:O',
y='17811:Q',
color=alt.Color(
'17811:Q', scale=alt.Scale(scheme='redyellowgreen', domain=(-1, 1)))
)
lines.properties(width=700, height=600).interactive()
=“ https://i.sstatic.net/ypaak.png” alt =“ out2”>
事先感谢您的帮助!
Description of the problem
My goal is quite basic: to plot time series in an interactive plot. After some research I decided to give a try to Altair.
There are already QGIS plugins for time-series visualisation, but as far as I'm aware, none for plotting time-series at vector-level, interactively clicking on a map and selecting a Polygon. So that's why I decided to go for a self-made solution using Altair, maybe combining it with Folium to add functionalities later on.
I'm totally new to the Altair library (as well as Vega and Vega-lite), and quite new in datascience and data visualisation as well... so apologies in advance for my ignorance!
There are already well explained tutorials on how to plot time series with Altair (for example here, or in the official website). However, my study case has some particularities that, as far as I've seen, have not yet been approached altogether.
The data is produced using the Python API for Google Earth Engine and preprocessed with Python and the pandas/geopandas libraries:
-
In Google Earth Engine, a vegetation index (NDVI in the current case) is computed at pixel-level for a certain region of interest (ROI). Then the function image.reduceRegions() is mapped across the ImageCollection to compute the mean of the ndvi in every polygon of a FeatureCollection element, which represent agricultural parcels. The resulting vector file is exported.
-
Under a Jupyter-lab environment, the data is loaded into a geopandas GeoDataFrame object and preprocessed, transposing the DataFrame and creating a datetime column, among others, in order to have the data well-shaped for time-series representation with Altair.
Data overview after preprocessing:

My "final" goal would be to show, in the same graphic, an interactive line plot with a set of lines representing each one an agricultural parcel, with parcels categorized by crop types in different colours, e.g. corn in green, wheat in yellow, peer trees in brown... (the information containing the crop type of each parcel can be added to the DataFrame making a join with another DataFrame).
I am thinking of something looking more or less like the following example, with legend's years being the parcels coloured by crop types:

But so far I haven't managed to make my data look this way... at all.
As you can see there are many nulls in the data (this is due to the application of a cloud masking function and to the fact that there are several Sentinel-2 orbits intersecting the ROI). I would like to just omit the non-null values for earch column/parcel, but I don't know if this data configuration can pose problems (any advice on that?).
So far I got:

- The generation of the preceding graphic, for a single parcel, takes already around 23 seconds. Which is something maybe shoud/cloud be improved (how?)
- And more importantly, the expected line representing the item/polygon/parcel's values (NDVI) is not even shown in the plot (note that I chose a parcel containing rather few non-null values).
For sure I am doing many things wrong. Would be great to get some advice to solve (some of) them.
Sample of the data and code to reproduce the issue
Here's a text sample of the data in JSON format, and the code used to reproduce the issue is the following:
import pandas as pd
import geopandas as gpd
import altair as alt
df= pd.read_json(r"path\to\json\file.json")
df['date']= pd.to_datetime(df['date'])
print(gdf.dtypes)
df
Output:

lines=alt.Chart(df).mark_line().encode(
x='date:O',
y='17811:Q',
color=alt.Color(
'17811:Q', scale=alt.Scale(scheme='redyellowgreen', domain=(-1, 1)))
)
lines.properties(width=700, height=600).interactive()
Output:

Thanks in advance for your help!
发布评论
评论(1)
如果我正确理解的话,大多数是您的数据框架的格式需要更改从宽到长,您可以通过pandas中的
。 -Melt
在Altair中通过。 -Melt
.transform_fold 进行操作。使用熔体,默认名称为'变量'
(上一列名称)和'value'
(每个列的值)的熔融列:差距来自Nans;如果您希望Altair插值缺少值,则可以删除NAN:
如果您想在Altair中进行全部完成,以下等同于上面的最后一个pandas示例(Transform formans use
> '键'
而不是'变量'
作为前一列的名称)。我还使用和序数代替标称类型来表明如何使颜色与您的示例更相似。:If I understand correctly, it is mostly the format of your dataframe that needs to be changed from wide to long, which you can do either via
.melt
in pandas or.transform_fold
in Altair. With melt, the default names are'variable'
(the previous columns name) and'value'
(the value for each column) for the melted columns:The gaps comes from the NaNs; if you want Altair to interpolate missing values, you can drop the NaNs:
If you want to do it all in Altair, the following is equivalent to the last pandas example above (the transform uses
'key'
instead of'variable'
as the name for the former columns). I also use and ordinal instead of nominal type for the color encoding to show how to make the colors more similar to your example.: