https://flyte.org/ 说这是
,用于规模的复杂,关键任务数据和机器学习过程的工作流程自动化平台
,我经过了很多文档,我看不到为什么它是“数据和机器学习”。在我看来,这是一个工作流程管理器,在容器兰花(这里是kubernetes)之上,工作流管理器的意思是,我可以定义有向的无环形图(DAG),然后将DAG节点部署为容器,而DAG则是DAG是跑步。
当然,这对于“数据和机器学习”很有用,很重要,但是我不妨将其用于任何其他微服务DAG。除了功能/详细信息外,这与 https://spark.apache.org 。
作为软件成就,我应该记住什么?
https://flyte.org/ says that it is
The Workflow Automation Platform for Complex, Mission-Critical Data and Machine Learning Processes at Scale
I went through quite a bit of documentation and I fail to see why it is "Data and Machine Learning". It seem to me that it is a workflow manager on top of a container orchastration (here Kubernetes), where workflow manager means, that I can define a Directed Acyclic Graphs (DAG) and then the DAG nodes are deployed as containers and the DAG is run.
Of course this is usefull and important for "Data and Machine Learning", but I might as well use it for any other microservice DAG with this. Except for features/details, how is this different than https://airflow.apache.org or other workflow managers (of which there are many). There are even more specialized workflow managers for "Data and Machine Learning", e.g., https://spark.apache.org.
What should I keep in mind as a Software Achitect?
发布评论
评论(1)
这是一个很好的问题。一方面,您是正确的,核心是无服务器的工作流编排器(无服务器,因为它确实会带来基础架构来运行代码)。是的,它可以用于多种其他情况。对于其他一些系统,例如微服务编排,它可能不是最好的工具。
但是,真正使它对ML&数据编排是功能的组合
具有
工作流程,使用不同的库,模型,输入等
Flyte了解数据框,并能够从Spark.dataframe-> pandas.dataframe-> modin-> Porars等,没有用户必须考虑如何有效地进行操作。还支持张量(正确序列化),numpy阵列等诸如模型以及从过去的执行中检索的模型,因此,实际上,Model Truth Store
例如,Spark,MPI,
集成
等等以及路线图
社区
中的许多其他人都集中在ML特定功能的
路线图
希望此答案有希望的答案你的问题。另外,请加入Slack社区,并帮助传播此信息。也问更多问题
That is a great question. You are right in one thing, at the core it is a Serverless Workflow Orchestrator (serverless, because it does bring up the infrastructure to run the code). And yes it can be used for multiple other situations. It may not be the best tool for some other systems like Micro-service orchestration.
But, what really makes it good for ML & Data Orchestration is a combination of
Features
workflow, use different libraries, models, inputs etc
Flyte understands dataframes and is able to translate dataframes from spark.dataFrame -> pandas.DataFrame -> Modin -> polars etc without the user having to think about how to do it efficiently. Also supports things like tensors (correctly serialized), numpy arrays, etc. Also models can be saved and retrieved from past executions so is infact the model truth store
e.g spark, MPI, sagemaker
For Admins
Integrations
etc and many others in the roadmap
Community
Focused on ML specific features
Roadmap
Hopefully this answers your questions. Also please join the slack community and help spread this information. Also ask more questions