KFlow requires at least Java 1.8 to run. It is known to work with Apache Kafka 2.3.0, but any recent version should be fine. Linux is required to achieve high performance, and MacOS can be used for testing.
KFlow is distributed as a self-containing JAR (also known as fat JAR). Simply download kflow-VERSION-all.jar
from GitHub releases page and run the following
(assuming java is on your PATH):
java -jar kflow-VERSION-all.jar
You can also build the project yourself (see Building section).
By default KFlow listens for IPFIX on port 4739/udp and pushes decoded flows to local Kafka broker localhost:9092
to topic ipfix
. It also exposes Prometheus metrics on port 8080/tcp. This behavior can be customized by overriding
default configuration.
Configuration values are resolved from multiple sources with the following precedence, from highest to lowest:
- Java system properties prefixed with
app.
, usually passed as one or more-Dapp.name=value
command line arguments, - custom
.properties
file specified with-c
or--config
command line argument, - default configuration values.
Property | Description |
---|---|
server.port |
Port number to listen for IPFIX flows |
server.threads |
Number of processing threads |
server.buffer.size |
SO_RCVBUF buffer size in bytes (allocated per processing thread) |
kafka.topic |
Kafka topic to write decoded flows to |
kafka.producers |
Number of Kafka producers (see Performance notes) |
kafka.props.bootstrap.servers |
Kafka brokers to setup initial connection with |
kafka.props.* |
Various Kafka producer configuration properties (see producer docs) |
metrics.port |
Port number to expose Prometheus metrics |
KFlow produces JSON records with the following fields:
Field | Type | Description |
---|---|---|
dvc |
string (IPv4) | IPv4 address of host that sent IPFIX packet |
src |
string (IPv4) | sourceIPv4Address (8) |
srcp |
number | sourceTransportPort (7) |
dst |
string (IPv4) | destinationIPv4Address (12) |
dstp |
number | destinationTransportPort (11) |
proto |
number | protocolIdentifier (4) |
flags |
number | tcpControlBits (6) |
bytes |
number | octetDeltaCount (1) |
pkts |
number | packetDeltaCount (2) |
time |
number | Export time in milliseconds since epoch |
For the description of fields other than dvc
or time
please refer to
IPFIX entities list.
-
Because of the way Netty handles UDP channels, KFlow requires
epoll()
call to be available to achieve high performance. It still can run withoutepoll()
support, though all processing will be done in a signle thread regardless of theserver.threads
setting. -
At a very high volumes (about 500,000 flows per second in our setup) shared lock inside Kafka producer code becomes a bottleneck. To overcome this limitation, KFlow can create multiple producers (as specified by
kafka.producers
config value) and distribute load between them. -
Receive queue monitoring and detecting packet drops is crucial for handling large volumes of traffic in production. KFlow exposes
server_packet_drops
andserver_receive_queue
Prometheus metrics. Both are parsed from/proc/net/udp
and therefore available only on Linux. -
When exposed metrics are not enough for performance troubleshooting, we recommend using profiling tools such as the excellent async-profiler.
KFlow uses Gradle as its build system.
./gradlew build
builds the project,./gradlew run
runs it.
Fat JAR is produced in build/libs/kflow-VERSION-all.jar
. The build also assembles redistributable application
archives in build/distributions
folder.
KFlow was built with extensibility in mind.
Support for a new flow export format, such as Netflow or sFlow, can be added by implementing PacketDecoder
interface and passing it as a constructor parameter to Server
class. Handling multiple formats in a single
application is possible by creating multiple Server
instances with different decoders and ports.
Similarly, Kafka output can be replaced by a custom Sink
interface implementation, which is to be passed
as parameter to Server
class constructor.
-
KFlow supports IPv4 addresses only. There are no plans for adding IPv6 support.
-
The exported field set is limited to basic fields only because of performance considerations.
If you need more feature rich solution and don't care about performance that much, you should probably
take a look at Cloudflare's goflow. In our setup a single goflow
node was able to handle about 250.000 flows per second.