Ingesting Data from Kafka Queues Deployed On-Prem into ... · Since Kafka has emerged as one of the...

8
Ingesting Data from Kafka Queues Deployed On-Prem into jSonar Cloud Systems Most jSonar systems are deployed on the Cloud yet consume data generated within enterprise data centers. Since Kafka has emerged as one of the most popular integration platforms and enterprise service bus architectures, this tech note describes how to integrate and consume data produced and placed on an on-prem Kafka queue with a jSonar Cloud service. As shown in Figure 1, the setup is based on rsyslog. Rsyslog is available as part of any RHEL system. Figure 1: Data flow between on-prem Kafka and the jSonar Cloud service Rsyslog is configured to be a Kafka consumer and subscribes to any number of Kafka topics. Rsyslog is configured to compress the data and encrypt the data as it is passing on the public Internet. It is also normally configured to batch the sending of the data rather than stream the data. All these are standard rsyslog configs. Data arrives in the jSonar Cloud through the rsyslog layer and is processed by SonarGateway. Any number of enrichment, mapping and analytic rules can be defined on the Gateway and once the JSON documents have been processed they are inserted into the SonarW column-store.

Transcript of Ingesting Data from Kafka Queues Deployed On-Prem into ... · Since Kafka has emerged as one of the...

Page 1: Ingesting Data from Kafka Queues Deployed On-Prem into ... · Since Kafka has emerged as one of the most popular integration platforms and enterprise service bus architectures, this

IngestingDatafromKafkaQueuesDeployedOn-PremintojSonarCloudSystems

MostjSonarsystemsaredeployedontheCloudyetconsumedatageneratedwithinenterprisedatacenters.SinceKafkahasemergedasoneofthemostpopularintegrationplatformsandenterpriseservicebusarchitectures,thistechnotedescribeshowtointegrateandconsumedataproducedandplacedonanon-premKafkaqueuewithajSonarCloudservice.AsshowninFigure1,thesetupisbasedonrsyslog.RsyslogisavailableaspartofanyRHELsystem.

Figure1:Dataflowbetweenon-premKafkaandthejSonarCloudserviceRsyslogisconfiguredtobeaKafkaconsumerandsubscribestoanynumberofKafkatopics.RsyslogisconfiguredtocompressthedataandencryptthedataasitispassingonthepublicInternet.Itisalsonormallyconfiguredtobatchthesendingofthedataratherthanstreamthedata.Allthesearestandardrsyslogconfigs.DataarrivesinthejSonarCloudthroughthersysloglayerandisprocessedbySonarGateway.Anynumberofenrichment,mappingandanalyticrulescanbedefinedontheGatewayandoncetheJSONdocumentshavebeenprocessedtheyareinsertedintotheSonarWcolumn-store.

Page 2: Ingesting Data from Kafka Queues Deployed On-Prem into ... · Since Kafka has emerged as one of the most popular integration platforms and enterprise service bus architectures, this

1.CreateSecureSonarGatewayConnectionsThis section describes how to set up Transport Layer Security (TLS) connections between the on-prem Kafka and jSonar Cloud. The authentication here is two-way. The sender authenticates with the receiver, and vice-versa. Communication is encrypted and compressed.

We’ll call the machines:

1. kafka -- the on-prem machine forwarding kafka events to sonargateway

2. sonargateway – the jSonar cloud receiving events.

In the following assume that sonargateway, and kafka are RedHat machines. Note that the instructions work for multiple kafka machines. You can copy the same kafka certificates (if that suits your security needs) to multiple machines and they will authenticate in the same fashion.

1.ConfigurationOn-Prem(calledtheagenthostortheKafkasourcehost):1.1Certificatecreation

$ sudo cat <<'EOF' | sudo tee /etc/yum.repos.d/rsyslog.repo [rsyslog_v8] name=Adiscon CentOS-$releasever - local packages for $basearch baseurl=http://rpms.adiscon.com/v8-stable/epel-7/$basearch enabled=1 gpgcheck=0 gpgkey=http://rpms.adiscon.com/RPM-GPG-KEY-Adiscon protect=1 EOF

$ sudo yum update rsyslog $ sudo yum install -y gnutls-utils rsyslog-gnutls

Generate a private key and a self-signed CA certificate. You will be asked some questions during the certificate generation. For this example choose some simple values and used mostly defaults. Configure the certificate generation to suit your organization:

$ certtool --generate-privkey --outfile ca-key.pem Generating a 2048 bit RSA private key... $ certtool --generate-self-signed --load-privkey ca-key.pem --outfile ca.pem Generating a self signed certificate... Details of the certificate's distinguished name. Press enter to ignore a field. Common name: SonarGateway .....

Page 3: Ingesting Data from Kafka Queues Deployed On-Prem into ... · Since Kafka has emerged as one of the most popular integration platforms and enterprise service bus architectures, this

State or province name: BC Country name (2 chars): CA ..... The certificate will expire in (days): 365 Extensions. Does the certificate belong to an authority? (y/N): y

..... Enter the e-mail of the subject of the certificate: [email protected] ..... Will the certificate be used to sign other certificates? (y/N): y ..... ..... ..... Other Information: Public Key ID: cf5168551caeda4d2ea28f873ed924e74a76208c

.....

..... Is the above information ok? (y/N): y Signing certificate... $ chmod 400 ca-key.pem $ ls -l total 12 -r--------. 1 root root 5823 Oct 20 13:31 ca-key.pem -rw-r--r--. 1 root root 1216 Oct 20 13:37 ca.pem

The security of the TLS connections depends on the security of ca-key.pem. Keep this file safe.

Now generate certificates for the sonargateway and kafka machines, similar to the CA certificate. The key here is the Common name setting. You will have to refer to these names later when you configure rsyslog.

Repeat this section twice -- once for the sonargateway machine and once for the kafka machine.

$ certtool --generate-privkey --outfile key.pem Generating a 2048 bit RSA private key... $ certtool --generate-request --load-privkey key.pem --outfile request.pem …

Common name: sonargateway ###### <----- 2nd time use 'kafka' ###### … State or province name: BC Country name (2 chars): CA

…… Is this a TLS web client certificate? (y/N): y Is this a TLS web server certificate? (y/N): y Self signature: verified

Page 4: Ingesting Data from Kafka Queues Deployed On-Prem into ... · Since Kafka has emerged as one of the most popular integration platforms and enterprise service bus architectures, this

$ certtool --generate-certificate --load-request request.pem --outfile cert.pem --load-ca-certificate ca.pem --load-ca-privkey ca-key.pem Generating a signed certificate... ..... The certificate will expire in (days): 365 ..... Is this a TLS web client certificate? (y/N): y ..... Is this a TLS web server certificate? (y/N): y ..... ..... Other Information: Public Key ID: 12af4f2b06018fc1f114a030f763385f1eeed463 ... Is the above information ok? (y/N): y Signing certificate...

Now rename the two generated files:

$ mv cert.pem sonargateway-cert.pem ## <---- 2nd time, to kafka-cert.pem

$ mv key.pem sonargateway-key.pem ## <---- 2nd time, to kafka-key.pem

Repeat these steps to generate a private key for kafka.

Once you’ve created both private keys copy the sonargateway*.pem and ca.pem files to the sonargateway. The command can look something like this:

## if sonargateway machine IP is 54.183.244.39... $ scp -i ~/.ssh/aws.pem {sonargateway*,ca}.pem [email protected]:

On the kafka machine move the files, restrict permissions, and set a selinux type for them. By using semanage you can keep selinux 'Enforcing':

$ sudo bash $ mkdir -p /etc/rsyslog.d/sonar/gateway/tls $ mv *pem /etc/rsyslog.d/sonar/gateway/tls $ chmod 400 /etc/rsyslog.d/sonar/gateway/tls/*.pem $ chown root:root /etc/rsyslog.d/sonar/gateway/tls $ chown root:root /etc/rsyslog.d/sonar/gateway/tls/* $ semanage fcontext -a -t syslog_conf_t "/etc/rsyslog.d/sonar/gateway/tls(/.*)?"

$ restorecon /etc/rsyslog.d/sonar/gateway/tls/*

Page 5: Ingesting Data from Kafka Queues Deployed On-Prem into ... · Since Kafka has emerged as one of the most popular integration platforms and enterprise service bus architectures, this

1.2ConfigureRsyslogtocommunicatewiththejSonarMaster/GatewayHost

Install the imkafka input module for rsyslog:

$sudoyuminstallrsyslog-kafka

Inwhatfollowsassumethatthekafkamachineisrunningthebroker.Ifyourbrokerisaseparateon-premmachineyoucanalsoconfigureaTLSconnectionbetweenthatbrokerandthekafkamachine.SeethersyslogimkafkadocumentationandapachekafkadocumentationforTLS.imkafka: https://www.rsyslog.com/doc/v8-stable/configuration/modules/imkafka.html

Apache Kafka: http://kafka.apache.org/documentation/#security

$cat>>/etc/rsyslog.d/kafka_forward.conf<<'EOF'global(defaultNetstreamDriverCAFile="/etc/rsyslog.d/sonar/gateway/tls/ca.pem"defaultNetstreamDriverCertFile="/etc/rsyslog.d/sonar/gateway/tls/agent-cert.pem"defaultNetstreamDriverKeyFile="/etc/rsyslog.d/sonar/gateway/tls/agent-key.pem")module(load="imkafka")input(type="imkafka"broker=["192.168.1.43:9092"]#theip/portofyourkafkabrokertopic="topic1"consumergroup="default"ruleset="10546_kafka_forward")ruleset(name="10546_kafka_forward"){action(type="omfwd"keepalive="on"streamdriver="gtls"streamdrivermode="1"streamdriverauthmode="x509/name"streamdriverpermittedpeers="sonargateway"#mustmatchthecertificatecommonnameziplevel="9"#forcompressioncompression.mode="single"#forcompressionprotocol="tcp"queue.type="Disk"#formessagequeuingqueue.spoolDirectory="/var/lib/rsyslog"

Page 6: Ingesting Data from Kafka Queues Deployed On-Prem into ... · Since Kafka has emerged as one of the most popular integration platforms and enterprise service bus architectures, this

queue.filename="rsyslog_kafka_queue"action.resumeRetryCount="-1"target="18.206.216.238"#theipaddressofthejSonarCloudport="10546")stop}EOFRestart the rsyslog service and check that you are listening for kafka input on port 9092 using lsof. $ sudo systemctl restart rsyslog $ lsof -i :9092 ## connection from the kafka broker COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME rsyslogd 11033 root 11u IPv4 33097466 0t0 TCP kafka:56578->kafka_broker:XmlIpcRegSvc (ESTABLISHED) rsyslogd 11033 root 16u IPv4 33097468 0t0 TCP kafka:56580->kafka_broker:XmlIpcRegSvc (ESTABLISHED)

2.ConfigurationsOnjSonarMasterHost(Gateway/Master):2.1ConfigureSonarGatewayCreate a file that references the certificates:

$ cat >> /etc/rsyslog.d/01-global-tls.conf <<'EOF' global(defaultNetstreamDriverCAFile="/etc/rsyslog.d/sonar/gateway/tls/ca.pem" defaultNetstreamDriverCertFile="/etc/rsyslog.d/sonar/gateway/tls/sonargateway-cert.pem" defaultNetstreamDriverKeyFile="/etc/rsyslog.d/sonar/gateway/tls/sonargateway-key.pem") EOF

Then modify the input modules file:

$ sed -i '/imtcp/d' /etc/rsyslog.d/sonar/gateway/include/modules.conf $ cat >> /etc/rsyslog.d/sonar/gateway/include/modules.conf <<'EOF' module(load="imtcp"

Page 7: Ingesting Data from Kafka Queues Deployed On-Prem into ... · Since Kafka has emerged as one of the most popular integration platforms and enterprise service bus architectures, this

streamdriver.name="gtls" streamdriver.mode="1" streamdriver.authmode="x509/name" permittedpeer=["kafka"] keepalive="on" ) EOF

From the directory where the certificate and pem files were copied to, run the following commands:

$ sudo bash $ mkdir -p /etc/rsyslog.d/sonar/gateway/tls $ mv *pem /etc/rsyslog.d/sonar/gateway/tls $ chmod 400 /etc/rsyslog.d/sonar/gateway/tls/*.pem $ chown root:root /etc/rsyslog.d/sonar/gateway/tls $ chown root:root /etc/rsyslog.d/sonar/gateway/tls/*

2.2ConfiguretheSonarGatewayKafkaService:Edit /etc/sonar/gateway/kafka_consumer.json. In the second output_connection (around line 45) add two attributes to specify the database into which the data should be inserted and the collection into which it should be inserted. For example:

{ "output_connection": { "event_format": { "standard": "JSON" }, "target_db": "my_database”, "default_collection": "my_kafka_data", "unique_label": "json", "group_label": "json" } } Uncomment the kafka listener in the sonargateway rsyslog configuration:

$ vi /etc/rsyslog.d/sonargateway.conf

#$IncludeConfig /etc/rsyslog.d/sonar/gateway/rulesets/ingest.conf

$IncludeConfig /etc/rsyslog.d/sonar/gateway/rulesets/kafka_consumer.conf

#$IncludeConfig /etc/rsyslog.d/sonar/gateway/rulesets/mapr.conf

Page 8: Ingesting Data from Kafka Queues Deployed On-Prem into ... · Since Kafka has emerged as one of the most popular integration platforms and enterprise service bus architectures, this

Restart the rsyslog service and check that you are listening on port 10546 using lsof.

$ sudo systemctl restart rsyslog

$ lsof -i :10546 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME rsyslogd 29278 root 5u IPv4 17478673 0t0 TCP *:10546 (LISTEN) rsyslogd 29278 root 6u IPv6 17478674 0t0 TCP *:10546 (LISTEN)