Hello Networking Jedi Masters,

I recently set up vFlow, an open source sFlow collector from Verizon Digital.
The collector works great, is quite stable, and easy to manage. I can’t say enough nice things about it.

But there is a weird hitch about which I wanted to ask this forum. Once I got my sFlow statistics, I wanted to know the throughput of my network applications, which seems quite reasonable, right? To estimate the total number of bytes sent in one sampled flow, I have code which calculates the following:

( Size of Sampled Packet Payload )  *  ( Sampling Rate )

In sFlow, these statistics would be:

( Packet.L3.TotalLen )  *  ( Sample.SamplingRate )

In other words, if vFlow samples a packet with 1,500 bytes in its IP payload, and my sampling rate is 1/64, then I’ll assume the entire flow is:

( 1500 bytes )  *  (  64 packets in this flow )  =  96,000 bytes in this flow

Simple. But when I did very basic measurement tests with iPerf, I noticed that this calculation was always off. And always off by a consistent amount. Doing a little reverse engineering by watching packet sizes, sampling rate, and actual number of bytes sent, I realized that the formula I actually need is this:

( Packet.L3.TotalLen )  *  ( Sample.SamplingRate )  *  (  Some constant  )  =  Total bytes sent

Hmm.

I did a lot of experimenting on this, playing with packet sizes. (My production environment uses 1/64 as our universal sampling rate, so I didn’t bother varying that.) For those who are curious, here are the results I found:

Packet Payload
(Bytes)          Constant
==============================
100              3.456997168
300              4.647359333
500              4.971690001
1000             5.230202893
1500             5.346555979

If you graph this, the line approaches ~5.35. I ran hundreds of individual tests for each packet size, and the results were extremely consistent; vFlow is very stable.

I don’t mind hardwiring this mysterious constant into my code to calculate total bytes sent. But I’m completely mystified why the constant is needed at all. Shouldn’t my original formula have been enough to estimate total throughput? What could that constant signify? Is this an sFlow thing? Or perhaps a vFlow (mis)configuration that I’ve overlooked?

Any thoughts are wildly appreciated. Thanks!

PS: The version of vFlow I am using is here, I opted for the Docker container version of vFlow. More general info about vFlow is here

Leave a Reply

Your email address will not be published. Required fields are marked *