C++ – The socket send call was blocked for so long

The socket send call was blocked for so long… here is a solution to the problem.

The socket send call was blocked for so long

I send 2 bytes of application data on the socket every 10 seconds (blocking), but the send call is blocked for more than 40 seconds in the last instance below.

  • 2012-06-13 12:02:46.653417| Before the message | sent
  • 2012-06-13 12:02:46.653457| INFO | after sending (2).
  • 2012-06-13 12:02:57.566898| Before the information is sent |
  • 2012-06-13 12:02:57.566962| INFO | after sending (2).
  • 2012-06-13 12:03:08.234060|INFO | before sending
  • 2012-06-13 12:03:08.234101| INFO | after sending (2).
  • **2012-06-13 12:03:19.010743|INFO | before sending
  • 2012-06-13 12:04:00.969162| INFO | after sending (2)**

The

default TCP send buffer size on the machine (Linux) is 65536.

The 2 bytes of data is used to heartbeat with the server, which expects the client to send HBs at least every 15 seconds.

Also, I don’t have an algorithm to disable naggle.

The question is – can the send call be blocked for as long as 40 seconds? And it happens only occasionally, it happens after almost 12 hours of operation.

The send call I know should just copy the data to the TCP send buffer.

Publish is called every 10 seconds. No, it doesn’t gradually slow down sending calls. It happens suddenly once, and then the application exits due to the socket closure on the other side.

int publish(char* buff, int size) const {
      /* Adds the 0x0A to the end */
      buff[size]=_eolchar;

if (_debugMode)
      {
          ACE_DEBUG((MY_INFO "before send\n"));
      }

int ret = _socket.send((void*)buff, size+1);

if (_debugMode)
      {
          ACE_DEBUG((MY_INFO "after send (%d)\n", ret));
          std::cout << "after send " << ret << std::endl;
      }

if (ret < 1)
      {
          ACE_DEBUG((MY_ERROR "Socket error, FH going down\n"));
          ACE_OS::sleep(1);
          abort();
      }
      return ret;
 }

Solution

When using blocking send() calls, you can treat the remote TCP buffer, the network, and the local send TCP buffer as one large buffer from your application’s perspective.

That is, if a remote application experiences a delay in reading new bytes from its TCP buffer, eventually your local TCP buffer will be (almost) full. If you attempt to send() overflow a new payload for the TCP buffer, the send() implementation (kernel system call) will not return focus to your application until the TCP buffer has enough space to store that payload.

The only way to achieve this state is when the remote application has not read enough bytes. A typical scenario in a test environment is when a remote application pauses at a breakpoint…:-)

THIS IS WHAT WE CALL THE SLOW CONSUMER PROBLEM. If you shared that diagnosis, there are several ways to fix the problem:

  1. If you can control the remote application, make it fast enough so that the local application is not blocked.
  2. If you don’t have control of the remote application, there may be several answers:
    • Depending on your own needs, you can block for up to 40 seconds.
    • If not, you need to use the non-blocking version of the send() system call. From here, there are multiple possible policies; I describe one below. (Please wait!:-))).

You can try using a dynamic array, which acts as a fake send TCP FIFO and grows when the send call is returned to you EWOULDBLOCK. In this case, you might have to use the select() system call to detect when the remote application is up and send it unseen data first.

The simple publish() function here can be a bit tricky (although common in most web applications). You must also be aware that there is no guarantee that the dynamic buffer grows to the point where you no longer have any available memory, and then your local application may crash. A typical strategy in “live” network applications is to choose an arbitrary maximum size for the buffer and close the TCP connection when reached, preventing your local application from running out of available memory. Choose the maximum wisely, as it depends on the number of potential slow consumer connections.

Related Problems and Solutions