Python CMS 创建类似 youtube 的视频网站?

发布于 2024-10-03 08:12:26 字数 54 浏览 8 评论 0原文

有谁知道用 python 编写的开源 CMS,我可以用它来制作像 YouTube 这样的网站?

Is anyone aware of a open source CMS written in python using which I can make a site like YouTube?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

悲欢浪云 2024-10-10 08:12:26

Django 是一个很好的 Python 框架,CherryPyPylons。然而,框架并不是 CMS。

开源视频 CMS 将是:Media Core

以下是有关 YouTube 构建方式的一些信息:(

来源:Google Video)

平台:

  1. Apache
  2. Python
  3. Linux (SuSe)
  4. MySQL
  5. psyco,一个动态 python-> 用于视频的C 编译器
  6. lighttpd 而不是 Apache

Web 服务器:

  1. NetScalar 用于负载平衡和缓存静态内容。
  2. 使用 mod_fast_cgi 运行 Apache。
  3. 请求被路由以供 Python 应用程序服务器处理。
  4. 应用程序服务器与各种数据库和其他信息源通信以获取所有数据并格式化 html 页面。
  5. 通常可以通过添加更多机器来扩展 Web 层。
  6. Python Web 代码通常不是瓶颈,它大部分时间都阻塞在 RPC 上。
  7. Python 允许快速灵活的开发和部署。考虑到他们面临的竞争,这一点至关重要。
  8. 通常页面服务时间少于 100 毫秒。
  9. 使用 psyco,一个动态的 python->C 编译器,它使用 JIT 编译器方法来优化内部循环。
  10. 对于加密等 CPU 密集型活动,他们使用 C 扩展。
  11. 一些预先生成的缓存 HTML 用于昂贵的渲染块。
  12. 数据库中的行级缓存。
  13. 完整形成的 Python 对象会被缓存。
  14. 计算一些数据并将其发送到每个应用程序,以便将值缓存在本地内存中。这是一个未充分利用的策略。最快的缓存位于您的应用程序服务器中,将预先计算的数据发送到所有服务器并不需要太多时间。只需有一个代理来监视更改、预先计算并发送即可。

视频服务:

  1. 成本包括带宽、硬件和功耗。

  2. 每个视频都由一个迷你集群托管。每个视频都由多个机器提供。

  3. 使用aa集群意味着:

    • 提供内容的磁盘越多,速度就越快。
    • 净空。如果一台机器出现故障,其他机器可以接管。
    • 有在线备份。
  4. 服务器使用lighttpd Web服务器进行视频:

    • Apache 的开销太大。
    • 使用 epoll 等待多个 fd。
    • 从单进程配置切换到多进程配置以处理更多连接。
  5. 最受欢迎的内容被转移到 CDN(内容分发网络):

    • CDN 在多个位置复制内容。内容更有可能更接近用户,跳跃次数更少,并且内容将在更友好的网络上运行。
    • CDN 机器大多在内存不足的情况下提供服务,因为内容非常受欢迎,几乎不会出现内容进出内存的情况。
  6. 不太受欢迎的内容(每天 1-20 次观看)使用各个托管网站中的 YouTube 服务器。

    • 存在长尾效应。一个视频可能会播放几次,但会播放很多视频。正在访问随机磁盘块。
    • 在这种情况下,缓存并没有多大用处,因此花钱购买更多缓存可能没有意义。这是一个非常有趣的点。如果您有长尾产品,缓存并不总是您的性能救星。
    • 调整 RAID 控制器并注意其他较低级别的问题以提供帮助。
    • 调整每台计算机上的内存,使其不会太多也不会太少。

提供视频要点:

  1. 保持简单且便宜。
  2. 保持简单的网络路径。内容和用户之间没有太多设备。路由器、交换机和其他设备可能无法承受如此大的负载。
  3. 使用商品硬件。硬件越贵,其他东西也就越贵(支持合同)。您也不太可能在网上找到帮助。
  4. 使用简单的常用工具。他们使用 Linux 中内置的大多数工具并在这些工具之上分层。
  5. 很好地处理随机搜索(SATA,调整)。

服务缩略图:

  1. 高效地完成是非常困难的。
  2. 每个视频大约有 4 个缩略图,因此缩略图比视频多得多。
  3. 缩略图仅托管在几台机器上。
  4. 看到与提供大量小物体相关的问题:
    • 大量磁盘查找以及操作系统级别的 inode 缓存和页面缓存问题。
    • 遇到每个目录文件限制。特别是 Ext3。转向更加分层的结构。 2.6 内核的最新改进可能会将 Ext3 大目录处理能力提高多达 100 倍,但在文件系统中存储大量文件仍然不是一个好主意。
    • 每秒请求数很高,因为网页可以在页面上显示 60 个缩略图。
    • 在如此高的负载下,Apache 表现不佳。
    • 在Apache前面使用squid(反向代理)。这工作了一段时间,但随着负载的增加,性能最终下降。从 300 个请求/秒增加到 20 个。
    • 尝试使用lighttpd,但由于是单线程,它停滞了。多进程模式会遇到问题,因为它们各自保留一个单独的缓存。
    • 由于有如此多的图像,设置一台新机器花费了超过 24 小时。
    • 重新启动计算机需要 6-10 小时才能预热缓存,从而无法访问磁盘。
  5. 为了解决所有问题,他们开始使用 Google 的 BigTable,一种分布式数据存储:
    • 避免小文件问题,因为它会将文件聚集在一起。
    • 快速、容错。假设其在不可靠的网络上工作。
    • 延迟较低,因为它使用分布式多级缓存。此缓存适用于不同的配置站点。

数据库:

  1. 早年
    • 使用 MySQL 存储元数据,例如用户、标签和描述。
    • 通过具有 10 个磁盘的整体 RAID 10 卷提供数据。
    • 他们靠信用卡为生,因此租用了硬件。当他们需要更多硬件来处理负载时,需要几天的时间才能订购和交付。
    • 他们经历了一个共同的演变:单一服务器,使用具有多个读取从属的单一主服务器,然后对数据库进行分区,最后采用分片方法。
    • 存在副本滞后问题。主站是多线程的,运行在大型机器上,因此它可以处理大量工作。从站是单线程的,通常在较小的机器上运行,并且复制是异步的,因此从站可能明显落后于主站。
    • 更新会导致缓存未命中,而缓存未命中会进入磁盘,而缓慢的 I/O 会导致复制缓慢。
    • 使用复制架构,您需要花费大量资金来提高写入性能。
    • 他们的解决方案之一是通过将数据分为两个集群来确定流量优先级:视频观看池和通用集群。这个想法是人们想要观看视频,因此该功能应该获得最多的资源。 YouTube 的社交网络功能不太重要,因此可以将它们路由到功能较弱的集群。
  2. 晚年:
    • 转到数据库分区。
    • 拆分为多个分片,将用户分配到不同的分片。
    • 传播写入和读取。
    • 更好的缓存局部性,这意味着更少的 IO。
    • 硬件减少了 30%。
    • 将副本延迟减少至 0。
    • 现在几乎可以任意扩展数据库。

数据中心策略

  1. 首先用于管理托管提供商。靠信用卡生活,这是唯一的出路。
  2. 托管托管无法随您扩展。您无法控制硬件或制定有利的网络协议。
  3. 所以他们采取了托管安排。现在他们可以定制一切并协商自己的合同。
  4. 使用 5 或 6 个数据中心加上 CDN。
  5. 视频来自任何数据中心。不是最接近的匹配或任何东西。如果视频足够受欢迎,它将进入 CDN。
  6. 取决于视频带宽,而不是真正取决于延迟。可以来自任何颜色。
  7. 对于图像来说,延迟很重要,尤其是当页面上有 60 个图像时。
  8. 使用 BigTable 将图像复制到不同的数据中心。代码会查看不同的指标来了解谁最接近。

Django is a good Python Framework, as well as CherryPy and Pylons. However, a framework is not a CMS.

An open source video CMS would be: Media Core

Here is some info about how YouTube is build:

(source: Google Video)

Platform:

  1. Apache
  2. Python
  3. Linux (SuSe)
  4. MySQL
  5. psyco, a dynamic python->C compiler
  6. lighttpd for video instead of Apache

Webservers:

  1. NetScalar is used for load balancing and caching static content.
  2. Run Apache with mod_fast_cgi.
  3. Requests are routed for handling by a Python application server.
  4. Application server talks to various databases and other informations sources to get all the data and formats the html page.
  5. Can usually scale web tier by adding more machines.
  6. The Python web code is usually NOT the bottleneck, it spends most of its time blocked on RPCs.
  7. Python allows rapid flexible development and deployment. This is critical given the competition they face.
  8. Usually less than 100 ms page service times.
  9. Use psyco, a dynamic python->C compiler that uses a JIT compiler approach to optimize inner loops.
  10. For high CPU intensive activities like encryption, they use C extensions.
  11. Some pre-generated cached HTML for expensive to render blocks.
  12. Row level caching in the database.
  13. Fully formed Python objects are cached.
  14. Some data are calculated and sent to each application so the values are cached in local memory. This is an underused strategy. The fastest cache is in your application server and it doesn't take much time to send precalculated data to all your servers. Just have an agent that watches for changes, precalculates, and sends.

Video serving:

  1. Costs include bandwidth, hardware, and power consumption.

  2. Each video hosted by a mini-cluster. Each video is served by more than one machine.

  3. Using a a cluster means:

    • More disks serving content which means more speed.
    • Headroom. If a machine goes down others can take over.
    • There are online backups.
  4. Servers use the lighttpd web server for video:

    • Apache had too much overhead.
    • Uses epoll to wait on multiple fds.
    • Switched from single process to multiple process configuration to handle more connections.
  5. Most popular content is moved to a CDN (content delivery network):

    • CDNs replicate content in multiple places. There's a better chance of content being closer to the user, with fewer hops, and content will run over a more friendly network.
    • CDN machines mostly serve out of memory because the content is so popular there's little thrashing of content into and out of memory.
  6. Less popular content (1-20 views per day) uses YouTube servers in various colo sites.

    • There's a long tail effect. A video may have a few plays, but lots of videos are being played. Random disks blocks are being accessed.
    • Caching doesn't do a lot of good in this scenario, so spending money on more cache may not make sense. This is a very interesting point. If you have a long tail product caching won't always be your performance savior.
    • Tune RAID controller and pay attention to other lower level issues to help.
    • Tune memory on each machine so there's not too much and not too little.

Serving Video Key Points:

  1. Keep it simple and cheap.
  2. Keep a simple network path. Not too many devices between content and users. Routers, switches, and other appliances may not be able to keep up with so much load.
  3. Use commodity hardware. More expensive hardware gets the more expensive everything else gets too (support contracts). You are also less likely find help on the net.
  4. Use simple common tools. They use most tools build into Linux and layer on top of those.
  5. Handle random seeks well (SATA, tweaks).

Serving Thumbnails:

  1. Surprisingly difficult to do efficiently.
  2. There are a like 4 thumbnails for each video so there are a lot more thumbnails than videos.
  3. Thumbnails are hosted on just a few machines.
  4. Saw problems associated with serving a lot of small objects:
    • Lots of disk seeks and problems with inode caches and page caches at OS level.
    • Ran into per directory file limit. Ext3 in particular. Moved to a more hierarchical structure. Recent improvements in the 2.6 kernel may improve Ext3 large directory handling up to 100 times, yet storing lots of files in a file system is still not a good idea.
    • A high number of requests/sec as web pages can display 60 thumbnails on page.
    • Under such high loads Apache performed badly.
    • Used squid (reverse proxy) in front of Apache. This worked for a while, but as load increased performance eventually decreased. Went from 300 requests/second to 20.
    • Tried using lighttpd but with a single threaded it stalled. Run into problems with multiprocesses mode because they would each keep a separate cache.
    • With so many images setting up a new machine took over 24 hours.
    • Rebooting machine took 6-10 hours for cache to warm up to not go to disk.
  5. To solve all their problems they started using Google's BigTable, a distributed data store:
    • Avoids small file problem because it clumps files together.
    • Fast, fault tolerant. Assumes its working on a unreliable network.
    • Lower latency because it uses a distributed multilevel cache. This cache works across different collocation sites.

Databases:

  1. The Early Years
    • Use MySQL to store meta data like users, tags, and descriptions.
    • Served data off a monolithic RAID 10 Volume with 10 disks.
    • Living off credit cards so they leased hardware. When they needed more hardware to handle load it took a few days to order and get delivered.
    • They went through a common evolution: single server, went to a single master with multiple read slaves, then partitioned the database, and then settled on a sharding approach.
    • Suffered from replica lag. The master is multi-threaded and runs on a large machine so it can handle a lot of work. Slaves are single threaded and usually run on lesser machines and replication is asynchronous, so the slaves can lag significantly behind the master.
    • Updates cause cache misses which goes to disk where slow I/O causes slow replication.
    • Using a replicating architecture you need to spend a lot of money for incremental bits of write performance.
    • One of their solutions was prioritize traffic by splitting the data into two clusters: a video watch pool and a general cluster. The idea is that people want to watch video so that function should get the most resources. The social networking features of YouTube are less important so they can be routed to a less capable cluster.
  2. The later years:
    • Went to database partitioning.
    • Split into shards with users assigned to different shards.
    • Spreads writes and reads.
    • Much better cache locality which means less IO.
    • Resulted in a 30% hardware reduction.
    • Reduced replica lag to 0.
    • Can now scale database almost arbitrarily.

Data Center Strategy

  1. Used manage hosting providers at first. Living off credit cards so it was the only way.
  2. Managed hosting can't scale with you. You can't control hardware or make favorable networking agreements.
  3. So they went to a colocation arrangement. Now they can customize everything and negotiate their own contracts.
  4. Use 5 or 6 data centers plus the CDN.
  5. Videos come out of any data center. Not closest match or anything. If a video is popular enough it will move into the CDN.
  6. Video bandwidth dependent, not really latency dependent. Can come from any colo.
  7. For images latency matters, especially when you have 60 images on a page.
  8. Images are replicated to different data centers using BigTable. Code looks at different metrics to know who is closest.
痞味浪人 2024-10-10 08:12:26

您还可以查看基于 Plone 的 http://plumi.org

You can also check out http://plumi.org, based on Plone.

耳根太软 2024-10-10 08:12:26

您可能也想看看 zencoder 的视频编码......

You might want to tak ea look at zencoder for video encoding too.....

小巷里的女流氓 2024-10-10 08:12:26

MediaCore好用吗?

http://getmediacore.com/

“开源视频 CMS,可集中您所有的视频和播客需求”

MediaCore any good?

http://getmediacore.com/

"The open source video CMS for centralizing all of your video and podcasting needs"

幸福还没到 2024-10-10 08:12:26

尽管不是 CMS,Django 可能非常有用。

Even though is not a CMS, Django could be very useful.

凉城已无爱 2024-10-10 08:12:26

Django 和 Pylons 是两个最流行的 Python 框架,可让您快速构建自己的 CMS 和 YouTube 等视频托管网站。

姜戈
http://www.djangoproject.com/

塔架
http://pylonshq.com/

制作自己的网站而不是依赖 CMS 确实是您最好的选择因为您必须弄清楚许多其他事情,例如如何将上传的视频转换为 FLV,这不属于核心 CMS 功能的一部分。还有很多其他考虑因素,例如利用云 CDN 来交付视频内容,而在我能想到的任何框架中,这些内容都不是开箱即用的,即使是用不同语言编写的框架也是如此。

Django and Pylons are the two most popular Python frameworks that will allow you to rapidly build your own CMS and youtube like video hosting site.

Django
http://www.djangoproject.com/

Pylons
http://pylonshq.com/

Making your own site instead of relying on a CMS is really going to be your best bet because you will have to figure out a lot of other things like how to convert the uploaded video to FLV that won't a part of the core CMS functionality. There are a lot of other considerations like leveraging a cloud CDN to deliver your video content that just doesn't exist out of the box in any framework that I can think of, even those written in different languages.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文