Concept
Network asset identification rules are a rule file that extracts network asset information from large text data or binary data. This rule file is manually written and then parsed by a parser engine. The parser engine extracts key information from the data source by using the asset extraction rules configured in the rule file.
Large text or binary data usually come from network traffic, system configuration files, and system logs.

Network asset information usually includes: IP, port, service, product name, product version, product manufacturer, system resource path, etc.
Background
In the process of network security monitoring and defense, an accurate and scalable asset identification system is crucial. Network offense and defense is a protracted war, and to be invincible in a hundred battles, one must first know oneself. Understanding what needs to be protected can better protect it.
Most commonly used asset management systems manage assets manually through a procedural approach, which is no longer sufficient in today's increasingly庞大的 information systems. Manual management can only handle basic information, such as IP, operating system, port, service, business name, and responsible person information, but it is difficult to maintain more in-depth asset information of the system, such as service versions, system frameworks, and system asset paths. However, these more in-depth asset information is often more important in network security. When we master the system service version, we can timely and easily screen, investigate, and repair severe vulnerability incidents; when we master the system framework used, we can target and filter network attack events detected; when we master all the resource path information of the system, even the parameter patterns, we can even apply the zero trust principle to intercept all other requests that do not comply with asset records.
Therefore, we want to solve the asset management problem through an automated asset detection and identification system, identify and extract asset information from the traffic data in real-time, and update the asset information library in real-time. And due to the complexity and variability of asset information, we need a set of standardized rules that allow security operation and maintenance personnel to write, dynamically expand the asset identification capability of the system.
The core of network security work is to do well with every detail, there is no shortcut, and asset identification is a small step among them.
Proposal
The rule file considers using XML format, which is relatively easy to read. The core extraction method uses regular expressions, especially the extraction of versions through the group feature of regular expressions.
The supported data sources include:
HTTP response header formatted data
HTML formatted data
XML formatted data
JSON formatted data
Single-line text data.
The supported extraction scenarios include:
Extracting service name/service version/component name/component version from Server field in HTTP response header
Extracting framework from X-Powered-By field in HTTP response header
Extracting component information from Tomcat error information in HTTP response body
Extracting framework name/manufacturer/version from Maven configuration file pom.xml
<?xml version="1.0" encoding="UTF-8" ?>
<assets>
<!--
How to handle the identification of the same asset by multiple rules, especially when there are ambiguous extraction rules and precise extraction rules, whether there is a priority
-->
<asset start="Server:" end="\r\n" greedy="true" >
<vendor seq="1">
<condition fromStart="false"><![CDATA[Server:\\s*(?<vendor>\\w+)]]></condition>
<result groupName="vendor" groupIndex="1"/>
</vendor>
<product required="true" seq="2">
<condition fromStart="false"><![CDATA[Server:\\s*(?<product>\\w+)]]></condition>
<result groupName="product" groupIndex="1"/>
</product>
<version seq="3">
<condition fromStart="false">nginx/(?<version>\d+\.\d+\.\d+)</condition>
<result groupName="version" groupIndex="1"/>
</version>
</asset>
<asset condition="Server:" greedy="true">
<vendor>
<condition fromStart="false"><![CDATA[Server:\\s*(?</condition>
<result groupName="vendor" groupIndex="1"/>
</vendor>
<product required="true">
<condition fromStart="false"><![CDATA[Server:\\s*(?<product>\\w+)]]></condition>
<result groupName="product" groupIndex="1"/>
</product>
<version>
<condition fromStart="false">nginx/(?<version>\d+\.\d+\.\d+)</condition>
<result groupName="version" groupIndex="1"/>
</version>
</asset>
<asset condition="Server:" greedy="false">
<vendor>
<condition fromStart="false"><![CDATA[Server:\\s*(?<vendor>\\w+)]]></condition>
<result groupName="vendor" groupIndex="1"/>
</vendor>
<product>
<condition fromStart="false"><![CDATA[Server:\\s*(?<product>\\w+)]]></condition>
<result groupName="product" groupIndex="1">unknown</result>
</product>
<version>
<condition fromStart="false">nginx/(?<version>\d+\.\d+\.\d+)</condition>
<result groupName="version" groupIndex="1">unknown</result>
</version>
</asset>
<asset>
<vendor>
<condition fromStart="false"><![CDATA[Server:\\s*(?<vendor>\\w+)]]></condition>
<result groupName="vendor" groupIndex="1"/>
</vendor>
<product>
<condition fromStart="false">Server: nginx</condition>
<result>nginx</result>
</product>
<version>
<condition fromStart="false">nginx/(?<version>\d+\.\d+\.\d+)</condition>
<result groupName="version" groupIndex="1">1.1.1</result>
</version>
</asset>
<asset condition="Server:">
<vendor>
<condition fromStart="false"><![CDATA[Server:\\s*(?<vendor>\\w+)]]></condition>
<result groupName="vendor" groupIndex="1"/>
</vendor>
<product>
<condition fromStart="false">nginx</condition>
<result>nginx</result>
</product>
<version>
<condition fromStart="false">nginx/(?<version>\d+\.\d+\.\d+)</condition>
<result groupName="version">unknown</result>
</version>
</asset>
<asset condition="Server:">
<vendor>
<condition fromStart="false"><![CDATA[Server:\\s*(?<vendor>\\w+)]]></condition>
<result groupName="vendor" groupIndex="1"/>
</vendor>
<product>
<condition fromStart="false">nginx</condition>
<result>nginx</result>
</product>
<version>
<condition fromStart="false">nginx/(\d+\.\d+\.\d+)</condition>
<<result groupIndex="1">unknown</result>
</version>
</asset>
</assets>
Seq cursor order matching mechanism
<asset start="Server:" end="\r\n" greedy="true" >
<vendor seq="1">
<condition fromStart="false"><![CDATA[Server:\\s*(?<vendor>\\w+)]]></condition>
<result groupName="vendor" groupIndex="1"/>
</vendor>
<product required="true" seq="2">
<condition fromStart="false"><![CDATA[Server:\\s*(?<product>\\w+)]]></condition>
<result groupName="product" groupIndex="1"/>
</product>
<version seq="3">
<condition fromStart="false">nginx/(?<version>\d+\.\d+\.\d+)</condition>
<result groupName="version" groupIndex="1"/>
</version>
</asset>
Group extraction mode
<asset>
<vendor>
<condition fromStart="false"><![CDATA[Server:\\s*(?<vendor>\\w+)]]></condition>
<result groupName="vendor" groupIndex="1"/>
</vendor>
<product>
<condition fromStart="false">Server: nginx</condition>
<result>nginx</result>
</product>
<version>
<condition fromStart="false">nginx/(?<version>\d+\.\d+\.\d+)</condition>
<result groupName="version" groupIndex="1">1.1.1</result>
</version>
</asset>
greedy greedy mode
<asset condition="Server:" greedy="true">
<vendor>
<condition fromStart="false"><![CDATA[Server:\\s*(?<vendor>\\w+)]]></condition>
<result groupName="vendor" groupIndex="1"/>
</vendor>
<product required="true">
<condition fromStart="false"><![CDATA[Server:\\s*(?<product>\\w+)]]></condition>
<result groupName="product" groupIndex="1"/>
</product>
<version>
<condition fromStart="false">nginx/(?<version>\d+\.\d+\.\d+)</condition>
<result groupName="version" groupIndex="1"/>
</version>
</asset>
Discussion
Question 1
Should the asset identification be increased with a protocol type attribute?
protocol="http"
dataType="http"
Since it is not only applicable to network data, it is recommended to name it dataType
What is the function of the dataType field: is it a preliminary condition, providing special matching functions for subsequent matching, or both?
<asset start="Server:" end="\r\n" greedy="true" dataType="http">
<vendor seq="1">
<condition fromStart="false"><![CDATA[Server:\\s*(?<vendor>\\w+)]]></condition>
<result groupName="vendor" groupIndex="1"/>
</vendor>
<product required="true" seq="2">
<condition fromStart="false"><![CDATA[Server:\\s*(?<product>\\w+)]]></condition>
<result groupName="product" groupIndex="1"/>
</product>
<version seq="3">
<condition fromStart="false">nginx/(?<version>\d+\.\d+\.\d+)</condition>
<result groupName="version" groupIndex="1"/>
</version>
</asset>
Question 2 [Merged with question 1]
Should the asset identification be increased with a data type attribute?
Merge with question 1
Question 3 [Not considered]
Should the network direction attribute be added?
Is it compatible with identification methods that require the joint action of requests and responses?
This method requires the specification of multiple input fields
request: Specifies data direction client -> server
response: Specifies data direction server -> client
default: Not specified, default is set to 'default', 'default' takes priority in matching 'response', and a single input
Since the asset identification data source does not necessarily come from web data, it is still focused on data extraction from individual texts. Moreover, multiple field inputs would disrupt the seq cursor order matching mechanism. Therefore, this proposal is not considered at this time.
<asset start="Server:" end="\r\n" greedy="true" dataType="http">
<vendor seq="1">
<condition fromStart="false" field="response"><![CDATA[Server:\\s*(?<vendor>\\w+)]]></condition>
<result groupName="vendor" groupIndex="1"/>
</vendor>
<product required="true" seq="2">
<condition fromStart="false"><![CDATA[Server:\\s*(?<product>\\w+)]]></condition>
<result groupName="product" groupIndex="1"/>
</product>
<version seq="3">
<condition fromStart="false">nginx/(?<version>\d+\.\d+\.\d+)</condition>
<result groupName="version" groupIndex="1"/>
</version>
</asset>
Question 4 [Not considered yet]
Does condition support multiple regular expression logical operations?
To improve accuracy and reduce false positives
Multiple regular expression patterns do not support extraction models, only matching models are supported.
Since there is already a limit of asset start="Server:" end="\r\n", a single regular expression match should be able to meet the requirements, and multiple regular expressions do not have a group extraction plan at present, only matching models can be supported, so this proposal is not considered for the time being.
<vendor seq="1">
<condition fromStart="false" field="response" expression="a+b|c">
<condition fromStart="false" field="response"><![CDATA[Server:\\s*(?<vendor>\\w+)]]></condition>
<condition fromStart="false" field="response"><![CDATA[Server:\\s*(?<vendor>\\w+)]]></condition>
</condition>
<result>Manufacturer xxx</result>
</vendor>
Question 5
How to handle the identification of the same asset by multiple rules, especially when there are ambiguous extraction rules and precise extraction rules, whether there is a priority
Question 6 [Not considered yet]
There are difficult data in regular expressions. Can we not write regular expressions, and there is still a certain regularity in asset extraction?
Use predefined regular expressions for predefined extraction methods commonly used.
It has not been put into actual operation yet. It will be predefined after the actual usage requirements are identified.
Implementation
To be continued

评论已关闭