Basic terminology

0 25
RambleThis article is the first of the author's learning about security static a...


Ramble

This article is the first of the author's learning about security static analysis, and will continue to update specific details such as taint analysis, data flow analysis, control flow analysis, as well as practical analysis of good open-source projects, and the introduction of specific implementation in DevSecOps.

One-sentence description

Basic terminology

The process of detecting errors, vulnerabilities, security risks, and potential issues in the code by analyzing the structure, syntax, and semantics of the source code, as well as using static analysis tools.


1689587695_64b50fefc9c6cbba578cf.png!small?1689587696517


Basic terminology

Lexical (Lexical)

  • Refers to the process of decomposing source code into basic lexical units or tokens; the lexical analyzer (Lexical Analyzer) or lexer is responsible for performing the lexical analysis task.

Syntax

  • Used to convert source code into an abstract syntax tree (Abstract Syntax Tree, AST) or other intermediate representation forms for further analysis.

Semantics

  • Analyze the source code to capture its meaning, semantics, and language rules, and to detect possible errors, inconsistencies, and potential issues.

Abstract Syntax Tree (AST)

  • A commonly used data structure in programming language processing and static analysis, which displays the organization and grammatical relationship of the code in a tree-like structure

Intermediate Representation (IR)

  • A kind of intermediate form of code used in compilers or interpreters. It is an abstract representation after the source code has been syntax and semantic analysis, which can facilitate optimization, transformation, and generation of target code, independent of specific hardware platforms or programming languages, with relatively independent and general characteristics, making it easier for compilers or interpreters to optimize and support cross-platform

Three-address code

  • Used to represent one of the intermediate code forms of computer programs, where each instruction has at most three operands, and usually two operands are used for operations, and the result is stored in the third operand

  • Each instruction contains three fields: operator (operator), operand 1 (operand1), and operand 2 (operand2), and stores the calculation result in operand 3 (result)

t1 = a+b
t2 = t1 * cd = t2 - a

Static Single Assignment (SSA)

  • One of the intermediate code forms, where each variable is assigned only once in the program, and the new variable name is usually the original variable name plus a unique identifier, which is convenient for data flow analysis, optimization, and code generation.

x1 = 1
y1 = x1 + 2x2 = 3z1 = x2 * y1

Data Flow Graph (DFG)

  • Used to describe the data flow and data dependency relationship in the program

  • Nodes represent operations or calculations in the program, such as variable definitions, assignments, and operations. Edges represent the transmission of data, that is, the path of data flow. Each edge has a direction, indicating the direction of data flow

  • Used for analysis such as constant propagation, copy propagation, and live variable analysis

Control Flow Graph (CFG)

  • Used to describe the control flow in the program, that is, the execution order and conditional branches of the program

  • Nodes represent basic blocks in the program, with each basic block containing a series of sequentially executed statements. Edges represent the transfer of control flow, that is, the jump relationship between different basic blocks. Each edge has a condition, indicating the condition for the execution jump.

  • Used for analysis such as execution path, conditional branches, and loop structures

Basic Block

  • Refers to a continuous piece of code in the program, which has only one entry point and one exit point

Call Graph (CG)

  • Used to analyze and understand the function call flow of the program

  • Nodes represent functions, and edges represent the call relationship between functions

  • Used to understand the direct or indirect call relationship between functions in the program, track the call path of functions, understand the execution flow of the program, and analyze the dependency relationship between functions

Program Dependence Graph (PDG)

  • Used to analyze the dependency relationship between the data flow and control flow of the program

  • Nodes represent a statement in the program, while edges represent the dependency relationships between statements

System Dependence Graph (SDG)

  • Used to describe the dependency relationships between various components in the system, including software components, hardware devices, network connections, databases, etc.

  • Nodes represent a component in the system, while edges represent the dependency relationships between components



Code Property Graph (CPG)


  • Source code intermediate representation, which is the latest and most widely used source code graphic representation in the current source code vulnerability static analysis technology, and is merged from AST, CFG, and PDG


1689587900_64b510bc54fb7b6493436.png!small?1689587902117




Analysis Methods

Taint Analysis

  • Track the propagation and usage of sensitive data in the program to detect potential data leaks, injection attacks, or security vulnerabilities

  • Abstracted into a triplet<sources, sinks, sanitizers>in the form, where,sourceThat is, the taint source, which represents the direct introduction of untrusted data or confidential data into the system;sinkThat is, the taint sink, which represents the direct generation of security-sensitive operations (violation of data integrity) or leakage of privacy data to the outside world (violation of data confidentiality);sanitizerThat is, harmless processing, which means that the propagation of data no longer poses a threat to the information security of the software system through means such as data encryption or removal of harmful operations. Taint analysis is to analyze whether the data introduced by the taint source in the program can be directly propagated to the taint sink without harmless processing. If not, it means that the system is information flow secure; otherwise, it means that the system has produced security issues such as privacy data leakage or dangerous data operations.

  • Taint Analysis Simplifies the Processing Process

1689587914_64b510cac2c2264d18363.png!small?1689587916153


  • Explicit Flow Analysis

      • Analyze how the taint mark propagates between data dependencies between variables in the program

    1689587925_64b510d5b27531ff3329c.png!small?1689587926823


  • Implicit Flow Analysis

      • Analyze how the taint mark propagates between control dependencies between variables in the program

    1689587937_64b510e1c5eb2bcb425e6.png!small?1689587938971


  • Harmless Processing

    • After processing by this module, the data itself no longer carries sensitive information, or operations on the data will no longer pose a threat to the system

    • For example, the input validation (input validation) module should be identified as a harmless processing module, XSS Auditor, CSRF Protect, etc.


Symbolic Execution

  • Used to automatically explore all possible execution paths of a program, by replacing specific input values in the program with symbolic values (Symbolic Value), then parsing these symbolic values through a constraint solver to generate input values that satisfy the program constraints, thereby executing different program paths

  • Simple Summary: The possible values of input points in all paths reaching a predetermined point during each analysis

  • Key Concepts and Steps

    • Symbolic Input: Replace input values in the program with symbolic values, where each symbolic value represents a class of possible specific values. For example, use a symbolic variable x to replace specific input values.

    • Symbolic Execution Path: Explore paths through symbolic values, determine different execution paths according to conditional statements in the program (such as if statements, loop conditions).

    • Constraint Condition Generation: During the symbolic execution process, collect constraint conditions on the path, which describe the constraint relationships in the program path. For example, the condition of an if statement is a constraint condition.

    • Constraint Solving: Use a constraint solver (Constraint Solver) to solve the collected constraints to obtain specific input values that satisfy the constraints.

    • Path Coverage and Error Detection: Through symbolic execution, explore multiple paths, cover different program execution situations, and help find potential errors and vulnerabilities. For example, if unreachable code or error conditions are found on a path, there may be program errors.

  • Symbolic Execution Process


1689587952_64b510f0560bb8ca9446d.png!small?1689587953719


Pointer Analysis

  • Used to infer the pointing relationship of pointer variables in the program, that is, to determine the objects or addresses that pointer variables may point to

  • Help detect potential memory security issues, such as null pointer dereference, wild pointer reference, etc.


Analysis Terms

Inter-procedure Analysis (Inter-procedure Analysis)

  • Used to analyze the behavior and properties of cross-function or cross-process in programs. It can track the call relationships between functions, pass information from one function to another, and perform comprehensive analysis of the behavior of the entire program

Intraprocedure Analysis (Intraprocedure Analysis)

  • Used to analyze the behavior and properties of programs within a single function or process. It focuses on the data flow, control flow, and semantic structure within the function to identify errors, vulnerabilities, and potential issues within the function


Context-Sensitive (Context-Sensitive)

  • Distinguish the same function called at different call locations, infer program behavior and properties according to the specific context environment of the program, in order to more accurately analyze and understand the semantics and behavior of the program

Context-Insensitive (Context-Insensitive)

  • Treat each call or return as a 'goto' operation, ignoring the call location and function parameter values, etc., for quick detection of potential security vulnerabilities and preliminary risk assessment

Flow-Insensitive Analysis

  • Without considering the order of statements, analyze each statement in sequence from top to bottom according to the physical location of the program statements, ignoring the branches existing in the program

Flow-Sensitive Analysis

  • Considering the possible execution order of program statements, it is usually necessary to use the program's control flow graph (CFG)


Static Analysis Process

Traditional Basic Process

1689587966_64b510fec6a42d9ff5276.png!small?1689587967848

Analysis Process Based on Learning Algorithms

1689587984_64b511103d82e060474d8.png!small?1689587984951


Basic Scanning Principle

1688557717_64a55895556476360734b.png!small?1688557717737



References

http://jcs.iie.ac.cn/xxaqxb/ch/reader/view_abstract.aspx?file_no=20220408&flag=1

http://jcs.iie.ac.cn/xxaqxb/ch/reader/view_abstract.aspx?file_no=20200510&flag=1

https://xiongyingfei.github.io/SA/2020/main.htm

https://www.jos.org.cn/html/2017/4/5190.htm

https://www.jsjkx.com/CN/article/openArticlePDF.jsp?id=11355

https://www.anquanke.com/post/id/157928

https://www.freebuf.com/articles/ics-articles/362885.html

https://firmianay.gitbook.io/ctf-all-in-one/5_advanced/5.3_symbolic_execution


你可能想看:

d) Adopt identification technologies such as passwords, password technologies, biometric technologies, and combinations of two or more to identify users, and at least one identification technology sho

Article 2 of the Cryptography Law clearly defines the term 'cryptography', which does not include commonly known terms such as 'bank card password', 'login password', as well as facial recognition, fi

Announcement regarding the addition of 7 units as technical support units for the Ministry of Industry and Information Technology's mobile Internet APP product security vulnerability database

Google Android 11 Beta version officially released, Baidu Security fortification technology first fully compatible

Interpretation of Meicreate Technology's 'Security Protection Requirements for Key Information Infrastructure' (Part 1)

Distributed Storage Technology (Part 2): Analysis of the architecture, principles, characteristics, and advantages and disadvantages of wide-column storage and full-text search engines

b) It should have a login failure handling function, and should configure and enable measures such as ending the session, limiting the number of illegal login attempts, and automatically logging out w

Hackers are using Windows RID hijacking technology to create hidden administrative accounts.

It is possible to perform credible verification on the system boot program, system program, important configuration parameters, and application programs of computing devices based on a credible root,

Data security can be said to be a hot topic in recent years, especially with the rapid development of information security technologies such as big data and artificial intelligence, the situation of d

最后修改时间:
admin
上一篇 2025年03月27日 12:51
下一篇 2025年03月27日 13:13

评论已关闭