在Vertexai上使用W2VEC嵌入时,管理数据漂移
因此,我正在考虑将我的模型从GCP的AI平台转移到Vertex AI,这是我的主要动机是,当您的数据偏斜或漂移时,顶点AI具有自动电子邮件通知( https://cloud.google.com/vertex-ai/vertex-ai/docs/model-monitoring/监视)。
因此,如果您开始收到不类似培训集的狡猾数据,他们会向您发送一封电子邮件,告诉您您要预测的数据的功能(列)正在摆脱培训数据。
但是,我不确定在我的情况下这将如何工作,因为我的数据是使用Word2Vec嵌入的文本数据。因此,我的数据集有300列,但我不知道每个列所指的功能是什么功能。
在我的特殊情况下,这种数据漂移分析仍然有用吗?
谢谢
So I am looking into moving my models from GCP's AI Platform to Vertex AI, my main motivation for it being the fact that Vertex AI has automatic email notifications when your data skews or drifts (https://cloud.google.com/vertex-ai/docs/model-monitoring/using-model-monitoring).
So if you start receiving dodgy data that doesn't resemble the training set, they send you an email telling you which features (columns) of the data you are trying to predict are drifting away from your training data.
However, I am unsure how this would work in my case since my data is text data that has been encoded using word2vec embeddings. Therefore, my dataset has 300 columns but I don't know what feature each of the columns refers to.
Is this sort of data drift analysis still useful in my particular case?
Thank you
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
目前,Vertex AI模型监视支持功能偏斜和漂移检测,用于分类和数值特征。正如您所说,嵌入不能追溯到实际数据,在这种情况下,编码本身不能被视为分类或数值。
是的,执行漂移分析将很有用。有几种方法可以说明NLP数据集中的漂移。您可以查看此 blog 有关解决此类漂移的更多信息NLP。请注意,本文不受Google Cloud正式支持。
At the moment, Vertex AI Model Monitoring supports feature skew and drift detection for categorical and numerical features only. As you said, the embeddings cannot be traced back to the actual data and the encodings themselves cannot be considered categorical nor numerical in this context.
Yes, it would be useful to perform drift analysis. There are several ways to account for drifts in an NLP data set. You can take a look at this blog for more information on tackling such drifts in NLP. Please note that this article is not officially supported by Google Cloud.