服务器在发送大块数据时被中断而崩溃
当我优雅地关闭与之连接的客户端时,我的服务器会崩溃,而客户端正在接收大量数据。我想到的是生命周期的错误,就像Boost Asio中最多的错误一样,但是我无法指出自己的错误。
每个客户端都建立了与服务器的2个连接,其中一个是用于同步,另一个连接是长期使用的连接,以接收连续的更新。在“同步阶段”中,客户端接收大量数据以与服务器状态同步(“状态”基本上是DB数据的JSON格式)。同步后,同步连接关闭。客户会通过其他连接接收到DB的更新(当然,这些数据与“同步数据”相比非常小)。
这些是相关文件:
connection.h
#pragma once
#include <array>
#include <memory>
#include <string>
#include <boost/asio.hpp>
class ConnectionManager;
/// Represents a single connection from a client.
class Connection : public std::enable_shared_from_this<Connection>
{
public:
Connection(const Connection&) = delete;
Connection& operator=(const Connection&) = delete;
/// Construct a connection with the given socket.
explicit Connection(boost::asio::ip::tcp::socket socket, ConnectionManager& manager);
/// Start the first asynchronous operation for the connection.
void start();
/// Stop all asynchronous operations associated with the connection.
void stop();
/// Perform an asynchronous write operation.
void do_write(const std::string& buffer);
int getNativeHandle();
~Connection();
private:
/// Perform an asynchronous read operation.
void do_read();
/// Socket for the connection.
boost::asio::ip::tcp::socket socket_;
/// The manager for this connection.
ConnectionManager& connection_manager_;
/// Buffer for incoming data.
std::array<char, 8192> buffer_;
std::string outgoing_buffer_;
};
typedef std::shared_ptr<Connection> connection_ptr;
connection.cpp
#include "connection.h"
#include <utility>
#include <vector>
#include <iostream>
#include <thread>
#include "connection_manager.h"
Connection::Connection(boost::asio::ip::tcp::socket socket, ConnectionManager& manager)
: socket_(std::move(socket))
, connection_manager_(manager)
{
}
void Connection::start()
{
do_read();
}
void Connection::stop()
{
socket_.close();
}
Connection::~Connection()
{
}
void Connection::do_read()
{
auto self(shared_from_this());
socket_.async_read_some(boost::asio::buffer(buffer_), [this, self](boost::system::error_code ec, std::size_t bytes_transferred) {
if (!ec) {
std::string buff_str = std::string(buffer_.data(), bytes_transferred);
const auto& tokenized_buffer = split(buff_str, ' ');
if(!tokenized_buffer.empty() && tokenized_buffer[0] == "sync") {
/// "syncing connection" sends a specific text
/// hence I can separate between sycing and long-lived connections here and act accordingly.
const auto& exec_json_strs = getExecutionJsons();
const auto& order_json_strs = getOrdersAsJsons();
const auto& position_json_strs = getPositionsAsJsons();
const auto& all_json_strs = exec_json_strs + order_json_strs + position_json_strs + createSyncDoneJson();
/// this is potentially a very large data.
do_write(all_json_strs);
}
do_read();
} else {
connection_manager_.stop(shared_from_this());
}
});
}
void Connection::do_write(const std::string& write_buffer)
{
outgoing_buffer_ = write_buffer;
auto self(shared_from_this());
boost::asio::async_write(socket_, boost::asio::buffer(outgoing_buffer_, outgoing_buffer_.size()), [this, self](boost::system::error_code ec, std::size_t transfer_size) {
if (!ec) {
/// everything is fine.
} else {
/// what to do here?
/// server crashes once I get error code 32 (EPIPE) here.
}
});
}
connection_hmanager.h
#pragma once
#include <set>
#include "connection.h"
/// Manages open connections so that they may be cleanly stopped when the server
/// needs to shut down.
class ConnectionManager
{
public:
ConnectionManager(const ConnectionManager&) = delete;
ConnectionManager& operator=(const ConnectionManager&) = delete;
/// Construct a connection manager.
ConnectionManager();
/// Add the specified connection to the manager and start it.
void start(connection_ptr c);
/// Stop the specified connection.
void stop(connection_ptr c);
/// Stop all connections.
void stop_all();
void sendAllConnections(const std::string& buffer);
private:
/// The managed connections.
std::set<connection_ptr> connections_;
};
connection_manager.cpp
#include "connection_manager.h"
ConnectionManager::ConnectionManager()
{
}
void ConnectionManager::start(connection_ptr c)
{
connections_.insert(c);
c->start();
}
void ConnectionManager::stop(connection_ptr c)
{
connections_.erase(c);
c->stop();
}
void ConnectionManager::stop_all()
{
for (auto c: connections_)
c->stop();
connections_.clear();
}
/// this function is used to keep clients up to date with the changes, not used during syncing phase.
void ConnectionManager::sendAllConnections(const std::string& buffer)
{
for (auto c: connections_)
c->do_write(buffer);
}
server.h
#pragma once
#include <boost/asio.hpp>
#include <string>
#include "connection.h"
#include "connection_manager.h"
class Server
{
public:
Server(const Server&) = delete;
Server& operator=(const Server&) = delete;
/// Construct the server to listen on the specified TCP address and port, and
/// serve up files from the given directory.
explicit Server(const std::string& address, const std::string& port);
/// Run the server's io_service loop.
void run();
void deliver(const std::string& buffer);
private:
/// Perform an asynchronous accept operation.
void do_accept();
/// Wait for a request to stop the server.
void do_await_stop();
/// The io_service used to perform asynchronous operations.
boost::asio::io_service io_service_;
/// The signal_set is used to register for process termination notifications.
boost::asio::signal_set signals_;
/// Acceptor used to listen for incoming connections.
boost::asio::ip::tcp::acceptor acceptor_;
/// The connection manager which owns all live connections.
ConnectionManager connection_manager_;
/// The *NEXT* socket to be accepted.
boost::asio::ip::tcp::socket socket_;
};
server.cpp,
#include "server.h"
#include <signal.h>
#include <utility>
Server::Server(const std::string& address, const std::string& port)
: io_service_()
, signals_(io_service_)
, acceptor_(io_service_)
, connection_manager_()
, socket_(io_service_)
{
// Register to handle the signals that indicate when the server should exit.
// It is safe to register for the same signal multiple times in a program,
// provided all registration for the specified signal is made through Asio.
signals_.add(SIGINT);
signals_.add(SIGTERM);
#if defined(SIGQUIT)
signals_.add(SIGQUIT);
#endif // defined(SIGQUIT)
do_await_stop();
// Open the acceptor with the option to reuse the address (i.e. SO_REUSEADDR).
boost::asio::ip::tcp::resolver resolver(io_service_);
boost::asio::ip::tcp::endpoint endpoint = *resolver.resolve({address, port});
acceptor_.open(endpoint.protocol());
acceptor_.set_option(boost::asio::ip::tcp::acceptor::reuse_address(true));
acceptor_.bind(endpoint);
acceptor_.listen();
do_accept();
}
void Server::run()
{
// The io_service::run() call will block until all asynchronous operations
// have finished. While the server is running, there is always at least one
// asynchronous operation outstanding: the asynchronous accept call waiting
// for new incoming connections.
io_service_.run();
}
void Server::do_accept()
{
acceptor_.async_accept(socket_,
[this](boost::system::error_code ec)
{
// Check whether the server was stopped by a signal before this
// completion handler had a chance to run.
if (!acceptor_.is_open())
{
return;
}
if (!ec)
{
connection_manager_.start(std::make_shared<Connection>(
std::move(socket_), connection_manager_));
}
do_accept();
});
}
void Server::do_await_stop()
{
signals_.async_wait(
[this](boost::system::error_code /*ec*/, int /*signo*/)
{
// The server is stopped by cancelling all outstanding asynchronous
// operations. Once all operations have finished the io_service::run()
// call will exit.
acceptor_.close();
connection_manager_.stop_all();
});
}
/// this function is used to keep clients up to date with the changes, not used during syncing phase.
void Server::deliver(const std::string& buffer)
{
connection_manager_.sendAllConnections(buffer);
}
所以我正在重复我的问题:当我优雅地关闭已连接到它的客户端时,我的服务器会崩溃正在收到大量数据,我不知道为什么。
编辑:一旦收到Epipe错误,崩溃就会发生在Async_write函数中。该应用程序是多线程。有4个呼叫服务器的线程::通过产生的数据传递。 trive()用于保持客户端的最新状态,它与初始同步无关:同步是通过从db获取的持续数据完成的。
我有一个io_service,所以我认为我不需要链。 io_service :: Run在主线程上调用,因此主线程正在阻止。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
审查,添加一些丢失的代码位:
现在我注意到的内容是:
您有一个
io_service
,所以一个线程。好的,除非您在其他代码中有线程(main
,例如?)。怀疑线程在起作用的一个特殊原因是,没有人可以调用
server :: velrive
,因为run()
正在阻止。这意味着每当您致电deliver()
现在会导致数据竞赛,这会导致未定义的行为休闲评论
对消除这种担忧并没有太大的作用。该法规需要防御滥用。评论不会被执行。使它变得更好:
您在接受“新”元素之前就没有检查以前的写作已完成。这意味着调用
连接:: do_write
在修改
Out Out Out out out__buffer _
在使用该缓冲区为ub的正在进行的异步操作中
在同一io对象上有两个重叠的
async_write
是ub(请参阅 docs修复的典型方法是将传出消息队列的队列。
而不是动态缓冲区(例如
std :: String
,因此您不必将缓冲区复制到字符串streambuf
,以便您可以使用std :: istream(&amp; sbuf _)
将其解析而不是tokenizingconcateNating
all_json_strs
显然 /em>拥有文本容器是浪费的。更好的是,请考虑一种流媒体方法来进行JSON序列化,因此并非所有JSON都需要在任何给定时间内序列化。
不要声明空驱动器(
〜连接
)。他们是pessimization(
ConnectionManager
)。如果必须,请考虑getNativeHandle
给我更多有关其他代码的问题,可能会干扰。例如,它可能指示其他执行操作的库,这再次可能导致重叠的读取/写入,或者可能是更多居住在线程上的代码的标志(如server :: run() )
连接管理器可能应保留
feal_ptr
,因此连接
最终可能会终止。现在,最后一个引用是按定义在连接管理器中举行的,这意味着当同伴断开连接或会话失败时,任何其他原因都不会破坏。这不是惯用的:
如果您关闭了受体,则使用
error :: Operation_aborted
无论如何都会调用完成处理程序。只需处理这一点,例如,在最终版本中,我将稍后发布:我注意到此评论:
实际上,您从不
cancel()
您代码中任何IO对象上的任何操作。同样,评论未执行。最好像您所说的那样做,并让毁灭者关闭资源。当使用对象时,这会防止伪造错误,并且还可以防止非常烦人的比赛条件,例如您关闭手柄时,其他一些线程在同一档案中重新打开了一个新的流,并且您将手柄放到了第三个方面party(使用getNativeHandle
)...您看到这是何处?再现问题?
在以这种方式进行了审查之后,我试图重复此问题,因此我创建了假数据:
对连接类进行了一些小调整:
我们可以通过NetCat:Good来获得服务器输出
和客户端
。现在,让我们过早断开连接:
因此,它确实会导致早期关闭,但是服务器仍然说
Let's Instrument
do_write
:现在我们看到:
一个断开连接和一个“好”连接。
没有崩溃/未定义行为的迹象。让我们检查一下
-fsanitize =地址,未定义的
:清洁记录,甚至添加心跳:结论
上面突出显示的唯一问题是:
其他线程问题未显示(也许可以通过<<代码> getNativeHandle )
您可以在连接
do_write
中重叠的事实。修复:如您所见,我还拆分
写
/do_write
以防止链外调用。与stop
相同。完整列表
完整列表,其中包括上面的所有备注/修复:
file
connection.h
文件
Connection_Manager.h
文件
server.h
文件
connection.cpp
文件
Connection_manager.cpp
文件
server.cpp
文件
test.cpp
Reviewing, adding some missing code bits:
Now the things I notice are:
you have a single
io_service
, so a single thread. Okay, so no strands should be required unless you have threads in your other code (main
, e.g.?).A particular reason to suspect that threads are at play is that nobody could possibly call
Server::deliver
becauserun()
is blocking. This means that whenever you calldeliver()
now it causes a data race, which leads to Undefined BehaviourThe casual comment
does not do much to remove this concern. The code needs to defend against misuse. Comments do not get executed. Make it better:
you do not check that previous writes are completed before accepting a "new" one. This means that calling
Connection::do_write
results in Undefined Behaviour for two reasons:modifying
outgoing_buffer_
during an ongoing async operation that uses that buffer is UBhaving two overlapped
async_write
on the same IO object is UB (see docsThe typical way to fix that is to have a queue of outgoing messages instead.
using
async_read_some
is rarely what you want, especially since the reads don't accumulate into a dynamic buffer. This means that if your packets get separated at unexpected boundaries, you may not detect commands at all, or incorrectly.Instead consider
asio::async_read_until
with a dynamic buffer (e.g.std::string
so you don't have to copy the buffer into a stringstreambuf
so you can usestd::istream(&sbuf_)
to parse instead of tokenizingConcatenating
all_json_strs
which clearly have to be owning text containers is wasteful. Instead, use a const-buffer-sequence to combine them all without copying.Better yet, consider a streaming approach to JSON serialization so not all the JSON needs to be serialized in memory at any given time.
Don't declare empty destructors (
~Connection
). They're pessimizationsLikewise for empty constructors (
ConnectionManager
). If you must, considerThe
getNativeHandle
gives me more questions about other code that may interfere. E.g. it may indicate other libraries doing operations, which again can lead to overlapped reads/writes, or it could be a sign of more code living on threads (asServer::run()
is by definition blocking)Connection manager should probably hold
weak_ptr
, soConnection
s could eventually terminate. Now, the last reference is by defintion held in the connection manager, meaning nothing ever gets destructed when the peer disconnects or the session fails for some other reason.This is not idiomatic:
If you closed the acceptor, the completion handler is called with
error::operation_aborted
anyways. Simply handle that, e.g. in the final version I'll post later:I notice this comment:
In fact you never
cancel()
any operation on any IO object in your code. Again, comments aren't executed. It's better to indeed do as you say, and let the destructors close the resources. This prevents spurious errors when objects are used-after-close, and also prevents very annoying race conditions when e.g. you closed the handle, some other thread re-opened a new stream on the same filedescriptor and you had given out the handle to a third party (usinggetNativeHandle
)... you see where this leads?Reproducing The Problem?
Having reviewed this way, I tried to repro the issue, so I created fake data:
With some minor tweaks to the Connection class:
We get the server outputting
And clients faked with netcat:
Good. Now let's cause premature disconnect:
So, it does lead to early close, but server still says
Let's instrument
do_write
as well:Now we see:
For one disconnected and one "okay" connection.
No sign of crashes/undefined behaviour. Let's check with
-fsanitize=address,undefined
: clean record, even adding a heartbeat:Conclusion
The only problem highlighted above that weren't addressed were:
additional threading issues not shown (perhaps via
getNativeHandle
)the fact that you can have overlapping writes in the Connection
do_write
. Fixing that:As you can see I also split
write
/do_write
to prevent off-strand invocation. Same withstop
.Full Listing
A full listing with all the remarks/fixes from above:
File
connection.h
File
connection_manager.h
File
server.h
File
connection.cpp
File
connection_manager.cpp
File
server.cpp
File
test.cpp