How to analyze the daily average of hundreds of millions of security logs on a single machine in real-time?

0 20
First, prefaceWhen building security for the client, it is inevitable to encount...

First, preface

When building security for the client, it is inevitable to encounter the task of processing and analyzing large amounts of data, such as analyzing the entire company's WAF logs, process derived data, K8s logs, etc. to discover risks, especially in the early stage of making anti-intrusion strategies, where timely alerts are needed but the strategy needs to be frequently modified. We need a system that can write strategies and smoothly release new strategies.

This system is ELKEID HUB (developed and open-sourced by ByteDance), a strategy writing and analysis platform that can meet the above requirements, and after several months of practice, it is quite satisfied with it.

Second, building a system to process more than hundreds of millions of data per day

How to analyze the daily average of hundreds of millions of security logs on a single machine in real-time?

ELKEID HUB needs to be used with other systems and requires manual configuration of Kafka, Elasticsearch, and Kibana.

Kafka: It is mainly used to build real-time data pipelines and stream processing applications. In the big data ecosystem, the generation and processing of data are often continuous, such as the user behavior logs of websites.

Elasticsearch: It is a distributed, RESTful API-based search and data analysis engine.

Kibana: Allows users to more intuitively understand and analyze the data stored in Elasticsearch (used for query analysis later).

2.1 ELKEID HUB

Elkeid HUB is a rule/event processing engine maintained by the Elkeid Team, which supports streaming/offline (the community version does not yet support) data processing. Its original intention is to solve complex data/event processing and external system linkage needs through standardized abstract syntax/rule.

Reference document: https://github.com/bytedance/Elkeid/blob/main/elkeidup/deploy_hub-zh_CN.md

Let's also talk about the pros and cons here

First, let's talk about the advantages: high performance, such as special optimization for json decoding and regular expressions (as learned from bigwigs), smooth strategy modification, and good visualization.

In fact, there are also some shortcomings: for example, ELKEID HUB has not been updated for a long time (some frontend bugs are not fixed), and if you want to use distributed and write plugins, you need to purchase the commercial version (if you cannot distribute, I think the坑is a bit bigger).

Recommended system: ubtun20 (ubuntu22 will report incompatibility)

# Configure for key-based login
echo -e '\nPubkeyAuthentication yes' >> /etc/ssh/sshd_config
systemctl restart sshd


# Generate key
cd ~/.ssh/
ssh-keygen -t rsa -b 4096
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 


# Test if the key is available
ssh 127.0.0.1
# Download
mkdir -p /root/.elkeidup && cd /root/.elkeidup


wget https://github.com/bytedance/Elkeid/releases/download/v1.9.1.4/elkeidup_package_v1.9.1.tar.gz.00
wget https://github.com/bytedance/Elkeid/releases/download/v1.9.1.4/elkeidup_package_v1.9.1.tar.gz.01
wget https://github.com/bytedance/Elkeid/releases/download/v1.9.1.4/elkeidup_package_v1.9.1.tar.gz.02
cat elkeidup_package_v1.9.1.tar.gz.* > elkeidup_package_v1.9.1.tar.gz


tar -xf elkeidup_package_v1.9.1.tar.gz
chmod a+x /root/.elkeidup/elkeidup


# https://www.freebuf.com/articles/endpoint/elkeidup init --host {ip} --hub_only to generate the configuration file config_example.yaml
https://www.freebuf.com/articles/endpoint/elkeidup init --host 10.234.170.51 --hub_only
cp config_example.yaml config.yaml 


# Deployment
https://www.freebuf.com/articles/endpoint/elkeidup deploy --hub_only

2.2, Kafka

Deploy Kafka via container, note that the following method cannot be accessed remotely (can only be accessed within the container), and modifications can be made as needed.

Reference document: https://github.com/bitnami/containers/tree/main/bitnami/kafka

version: "2"


services:
zookeeper:
image: docker.io/bitnami/zookeeper:3.9
ports:
- "2181:2181"
volumes:
- "zookeeper_data:/bitnami"
environment:
- ALLOW_ANONYMOUS_LOGIN=yes
kafka1:
image: docker.io/bitnami/kafka:3.4
ports:
- "9092:9092"
volumes:
- "kafka1_data:/bitnami"
environment:
- KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
depends_on:
- zookeeper
kafka2:
image: docker.io/bitnami/kafka:3.4
ports:
- "9092:9092"
volumes:
- "kafka2_data:/bitnami"
environment:
- KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
depends_on:
- zookeeper


volumes:
zookeeper_data:
driver: local
kafka1_data:
driver: local
kafka2_data:
driver: local
你可能想看:

b) It should have the login failure handling function, and should configure and enable measures such as ending the session, limiting the number of illegal logins, and automatically exiting when the lo

b) It should have a login failure handling function, and should configure and enable measures such as ending the session, limiting the number of illegal login attempts, and automatically logging out w

As announced today, Glupteba is a multi-component botnet targeting Windows computers. Google has taken action to disrupt the operation of Glupteba, and we believe this action will have a significant i

Data security can be said to be a hot topic in recent years, especially with the rapid development of information security technologies such as big data and artificial intelligence, the situation of d

4.5 Main person in charge reviews the simulation results, sorts out the separated simulation issues, and allows the red and blue teams to improve as soon as possible. The main issues are as follows

Ensure that the ID can be accessed even if it is guessed or cannot be tampered with; the scenario is common in resource convenience and unauthorized vulnerability scenarios. I have found many vulnerab

Distributed Storage Technology (Part 2): Analysis of the architecture, principles, characteristics, and advantages and disadvantages of wide-column storage and full-text search engines

5. Collect exercise results The main person in charge reviews the exercise results, sorts out the separated exercise issues, and allows the red and blue sides to improve as soon as possible. The main

d) Adopt identification technologies such as passwords, password technologies, biometric technologies, and combinations of two or more to identify users, and at least one identification technology sho

Announcement regarding the addition of 7 units as technical support units for the Ministry of Industry and Information Technology's mobile Internet APP product security vulnerability database

最后修改时间:
admin
上一篇 2025年03月29日 16:21
下一篇 2025年03月29日 16:44

评论已关闭