GitLab CI integration with comprehensive checkpoint capabilities for security scanning

In many internet companies, the private deployment of the GitLab platform is the most common way to host internal project code.

The GitLab platform is powerful, not only used for Git project code hosting but also has comprehensive CI/CD capabilities that can help R&D colleagues complete code submission, project compilation, and project deployment in one-stop, greatly simplifying the integration work of various platforms in the DevOps process.

GitLab CI integration with comprehensive checkpoint capabilities for security scanning

The most important technology among them is the GitLab CI capability provided by the GitLab platform. It can use a YAML formatted configuration file to complete the full process of the entire project without the need for additional platform configuration (such as Jenkins).

Today, we will discuss how to complete static code security scanning and have the ability to set security checkpoints in projects that adopt GitLab CI.

What is GitLab CI

As the title suggests, let's first introduce the powerful Gitlab CI technology.

GitLab CI (Continuous Integration) is a continuous integration/continuous deployment solution provided by GitLab, which can help development teams automate the building, testing, and deployment of applications. With GitLab CI, development teams can automatically build, test, and deploy applications when code changes occur, thereby improving development efficiency and software quality.

GitLab CI defines a series of Jobs (tasks) based on the .gitlab-ci.yml file. Each Job contains one or more specific steps, such as compiling code, running tests, and packaging applications. After a Job is completed, the execution result can be used to decide whether to continue with the next Job or terminate the entire process.

GitLab CI provides many useful features, such as parallel builds, containerized builds, custom environment variables, and report analysis. It also supports multiple languages and frameworks, including Java, Python, Node.js, Ruby, and containerization technologies such as Docker and Kubernetes.

Using GitLab CI can improve development efficiency, reduce manual operations, improve code quality and reliability, and facilitate management and maintenance. At the same time, GitLab CI is tightly integrated with GitLab, and CI pipelines can be viewed and managed through GitLab's interface, making it more convenient.

Let's practice the use of GitLab CI.

What is Gitlab CI Runner

Gitlab RunnerIt is responsible for executing Gitlab CI tasks. We need to configure the GitLab CI Runner for the GitLab platform before we can use GitLab CI. For detailed information, please seehttps://docs.gitlab.com/runner/.

Use Case Demonstration

We have a Java project on the GitLab platform calledProjectJava. We need to use GitLab CI technology to fully implement project testing, compilation, deployment, and other work.

Firstly, we need to create a.gitlab-ci.ymlConfiguration file, write the following content:

stages:                 # Define multiple stages
  - build               # Building
  - test                # Testing
  - deploy              # Deployment

build_job:              # Define a build task
  stage: build          # Specify the stage
  script:
    - mvn package       # Execute command: build the application

test_job:               # Define a test task
  stage: test           # Specify the stage
  script:
    - mvn test          # Execute command: run unit tests

deploy_job:             # Define a deployment task
  stage: deploy         # Specify the stage
  script:
    - https://www.freebuf.com/articles/es/deploy.sh       # Execute command: invoke script to deploy the application
  only:
    - master            # Execute only when the master branch is committed

When we commit project code, GitLab will automatically run the .gitlab-ci.yml configuration file in the root directory and execute the instructions inside.

The most core parts of GitLab CI are 2:stageandjob.

As mentioned earlier, GitLab CI is composed of a series of jobs, where a job is a unit of task execution. However, when this job is executed, it is determined by the stage.

The following code we see in the .gitlab-ci.yml configuration file is as follows:

stages:                 # Define multiple stages
  - build               # Building
  - test                # Testing
  - deploy              # Deployment

That is, the project has customized 3 stages, representing the three stages of project execution.

Then the tasks ending with _job will have a stage tag, indicating which stage the task is executed in.

So the execution order of the above configuration is as follows:

This way, by customizing stages and jobs, we can achieve any functionality we want. Of course, the GitLab CI syntax is not just these, for details, please see:

https://docs.gitlab.com/ee/ci/quick_start/.

After configuring .gitlab-ci.yml, we commit the project code to the gitlab platform, and view the Pipeline pipeline to see that our various tasks have been executed.

If the R&D business all uses Gitlab CITo compile and deploy, how should we integrate security scanning?

In other words, we now have an independent code security scanning engine. How should we integrate it into these projects to help R&D solve security issues?

General configuration for GitLab CI to access security scanning

一般来说，我们是通过添加安全扫描Job的方式来做这件事。

我们上面说过GitLab CI通过添加Stage和Job的方式进行管理，那我们可以添加一个名字叫做secscan的stage，作为我们的安全扫描节点。

stages:                 # Define multiple stages
  - build               # Building
  - secscan             # 安全扫描
  - test                # Testing
  - deploy              # Deployment

在这个扫描节点里，我们实现把相关信息传递给代码扫描引擎，完成扫描工作。

我们的Job可以叫做secscan-job，可以这么写：

secscan-job:
  stage: secscan
  script:
    - export MULT_COMMIT_BRANCH=${CI_COMMIT_BRANCH}
    - if [ ! "$MULT_COMMIT_BRANCH" ]; then export MULT_COMMIT_BRANCH=${CI_MERGE_REQUEST_TARGET_BRANCH_NAME}; fi
    - if [ ! "$MULT_COMMIT_BRANCH" ]; then export MULT_COMMIT_BRANCH=${CI_COMMIT_TAG}; fi
    - python3 /home/agent/gitlab_secscan.py
      --gitUrl "${CI_PROJECT_URL}.git"
      --gitCommitId ${CI_COMMIT_SHA}
      --gitBranch $MULT_COMMIT_BRANCH
      --gitProjectPath ${CI_PROJECT_PATH}
      --url ${CI_PIPELINE_URL}
      --users ${GITLAB_USER_LOGIN}
      --pipelineId ${CI_PIPELINE_ID}

Gitlab CI提供了非常多的环境变量，具体可查看https://docs.gitlab.com/ee/ci/variables/predefined_variables.html.

我们通过script获取了当前本次提交的项目信息后，执行了/home/agent/gitlab_secscan.py这个脚本来处理这些信息。

这个脚本在哪里？

前面我有提到，Gitlab CI的任务执行，都是通过Gitlab Runner来负责执行的，Gitlab Runner可以是物理机，docker镜像，甚至是K8S环境。

所以这个脚本应该放到Gitlab Runner环境里！这样在执行的时候就会自动执行这个脚本！

当然这个脚本的内容不是本文的重点，无非是实现获取这些参数，再传递给扫描引擎进行安全扫描，如图：

设计好如上的.gitlab-ci.yml后，我们提交程序，安全扫描Job就会被触发。

安全卡点

一般来说，如果不需要因为安全问题对流程进行卡点的话，上面的配置就足够了。扫描发送到SAST扫描引擎，不影响Pipeline流水线的执行流程，不影响业务开发。安全方通过人工、自动化分析扫描结果，创建Jira，然后跟进漏洞修复。

但是安全不卡点还叫DevSecOps吗？又何谈安全左移呢？

当然你可以说，安全卡点导致误报率，业务影响什么的，这不在本文的讨论范围，以后有机会讨论。

如果我们现在需要做的，就是发现了严重的安全问题，比如log4j2组件调用，我们就是需要停止掉整个流水线操作，让业务修复漏洞后才可以继续，我们该怎么办？

利用Gitlab CI实现卡点，还是比较简单的，实现原理很简单：如果某个Job在运行过程中返回非0错误码，当前Job会自动停止，并阻断后续Job的运行。

我们来试一下：

secscan-job:
  stage: secscan
  script:
    - 我完成了！
    - exit 255

We directly simulate returning error code 255, run the pipeline, and find that the secscan-job fails at the same time, and the subsequent pipeline is also blocked.

Then we can in ourgitlab_secscan.pyMake a judgment in the script, if the scan finds a security vulnerability, return an error through exit.

Optimized GitLab CI integration with security scanning

We write the secscan-job into the project's .gitlab-ci.yml, and it looks fine, but as a security person, we face tens of thousands of projects that need to be connected to security scanning, what should we do?

Call on R&D to add the secscan-job task in their own .gitlab-ci.yml?

In essence, adding security scanning is causing trouble for R&D, and the other party does not want to add it, how can you identify it?

Even if it is added, what about subsequent changes? Will all R&D have to modify it again?

Changes require the cooperation of all R&D, too big动静, difficult to implement.

If the number of projects is not too large, we can improve the basic plan and use gitlab ci'sincludeGrammar optimization work is completed, official documentation: https://docs.gitlab.com/ee/ci/yaml/includes.html.

Like PHP providedincludeSimilarly, Gitlab CI allows the use of include to introduce common templates, solving the problem of unified control of the same configuration.

We will put the common parts of our basic plan into a common template:

http://gitlab.xxx.com/common/gitlab_ci_template/.base_gitlab_ci.yml

secscan-job:
  stage: secscan
  script:
    - export MULT_COMMIT_BRANCH=${CI_COMMIT_BRANCH}
    - if [ ! "$MULT_COMMIT_BRANCH" ]; then export MULT_COMMIT_BRANCH=${CI_MERGE_REQUEST_TARGET_BRANCH_NAME}; fi
    - if [ ! "$MULT_COMMIT_BRANCH" ]; then export MULT_COMMIT_BRANCH=${CI_COMMIT_TAG}; fi
    - python3 /home/agent/gitlab_secscan.py
      --gitUrl "${CI_PROJECT_URL}.git"
      --gitCommitId ${CI_COMMIT_SHA}
      --gitBranch $MULT_COMMIT_BRANCH
      --gitProjectPath ${CI_PROJECT_PATH}
      --url ${CI_PIPELINE_URL}
      --users ${GITLAB_USER_LOGIN}
      --pipelineId ${CI_PIPELINE_ID}

Then include this template in each subproject:

include:
  - project: 'common/gitlab_ci_template'  # Project name
    ref: master   # Branch
    file: 'common/gitlab_ci_template/.base_gitlab_ci.yml'  # Common configuration file
    
stages:                 # Define multiple stages
  - build               # Building
  - test                # Testing
  - secscan             # Security scan nodes
  - deploy              # Deployment

build_job:              # Define a build task
  stage: build          # Specify the stage
  script:
    - mvn package       # Execute command: build the application

test_job:               # Define a test task
  stage: test           # Specify the stage
  script:
    - mvn test          # Execute command: run unit tests

deploy_job:             # Define a deployment task
  stage: deploy         # Specify the stage
  script:
    - https://www.freebuf.com/articles/es/deploy.sh       # Execute command: invoke script to deploy the application
  only:
    - master            # Execute only when the master branch is committed

This solves the problem. We can let the R&D team follow this template to access, if there are changes in the subsequent security scan node, we changecommon/gitlab_ci_templateproject is good enough!

But did you find a problem, ourcommon/gitlab_ci_templateIn the public template, what is the stage of the secscan-job? It issecscanWhat if there is no such stage in the business project code? Then it certainly cannot be executed!

The default Stage mechanism of Gitlab CI

If the project template defines its own Stage, then inincludeThe Stage defined in the public template cannot take effect (it will report an error, you can try it yourself). To solve this problem, we need to study the Stage mechanism of Gitlab CI.

Let's take a look at the official documentation's description of Stages (https://docs.gitlab.com/ee/ci/yaml/#stages)：

Usestages to define stages that contain groups of jobs. Usestagein a job to configure the job to run in a specific stage.
Ifstages is not defined in the.gitlab-ci.ymlfile, the default pipeline stages are:
.pre
build
test
deploy
.post

If the project is not configured with Stages in gitlab-ci.yml, the default is the above Stages, which can be used directly without definition.

However, if the user project customizes Stages, then the default Stages cannot be used directly.

We notice that the first (.pre) and the last (.post) stages are different from the others, let's take a look at the document description.

If a pipeline contains only jobs in the .pre or .post stages, it does not run. There must be at least one other job in a different stage. .pre and .post stages can be used in required pipeline configuration to define compliance jobs that must run before or after project pipeline jobs.

means.preand.postThese two stages are the default execution stages. If there are other stages executed in the project, they will be executed first before executing them..pre stage, after execution, will execute.post stage!

And these two stages do not need to be defined separately!

Returning to our scan configuration improvement plan, so that we can put oursecscan-jobPut into.pre Stage is enough.

.pre stage will be executed before the first specific stage is defined, fully meeting our needs for security checkpoints. We need to perform security checks and checkpoints on the pipelines that trigger compilation and deployment tasks, and not to handle general commits that do not trigger pipelines.

Specific public templates are as follows:

secscan-job:
  stage: .pre
  script:
    - export MULT_COMMIT_BRANCH=${CI_COMMIT_BRANCH}
    - if [ ! "$MULT_COMMIT_BRANCH" ]; then export MULT_COMMIT_BRANCH=${CI_MERGE_REQUEST_TARGET_BRANCH_NAME}; fi
    - if [ ! "$MULT_COMMIT_BRANCH" ]; then export MULT_COMMIT_BRANCH=${CI_COMMIT_TAG}; fi
    - python3 /home/agent/gitlab_secscan.py
      --gitUrl "${CI_PROJECT_URL}.git"
      --gitCommitId ${CI_COMMIT_SHA}
      --gitBranch $MULT_COMMIT_BRANCH
      --gitProjectPath ${CI_PROJECT_PATH}
      --url ${CI_PIPELINE_URL}
      --users ${GITLAB_USER_LOGIN}
      --pipelineId ${CI_PIPELINE_ID}

Submit the code and take a look at it,.preStage is executed, and our security scan Job is triggered first!

So far, we have truly realized that only by introducing our public template is needed for the project, without any changes to the project's .gitlab-ci.yml!

include:
  - project: 'commom/gitlab_ci_template'  # Project name
    ref: master   # Branch
    file: 'commom/gitlab_ci_template/.base_gitlab_ci.yml'  # Public configuration file

If the test finds that the push operation can be triggered normallysecscan-jobHowever, since the Merge Request event did not trigger, you can use the following solution: https://gitlab.com/gitlab-org/gitlab-runner/-/issues/5970

The solution is to add Job configuration:

rules:
    - when: on_success

Hope to know.

GitLab CI integration with comprehensive checkpoint capabilities for security scanning

Through the above optimization, we have perfectly realized the access method that requires no changes to the project's .gitlab-ci.yml except for introducing our template.

However, the problem that still exists is: if the project has not connected to the public template, or has been blocked due to security issues, users can also completely comment out the public scan template first, submit the code, and then restore it.

In this way, our security scan checkpoint is as good as non-existent and can be easily bypassed!

Is there a way to implement a mandatory checkpoint that R&D colleagues cannot skip!

available, that is, through the GitLab Runner checkpoint scanning.

From the above figure, we find that the previous access solutions were allREPOend, which is controlled by project colleagues, and we cannot enforce a checkpoint.

If we want to avoid the project'sControlIf so, we can consider placing the security detection checkpoint capability on the right side of the Gitlab Runner.

This approach has the following advantages:

No project access is required, security detection is automatically performed when calling the Pipeline
New project “Zero cost”, “Unperceived” access
Forced to access security detection, cannot bypass actively

How to achieve this?

As mentioned earlier, all our Jobs are executed on the Gitlab Runner, whether it is a security scan Job or other business Job.

If the business Job can provide a Hook event before execution, we can use this Hook event to perform the pre-execution security scan work.

Fortunately, we found that the Gitlab CI Runner configuration provides such an event:pre_clone_script.

pre_clone_script
This configuration allows Gitlab Runner to execute a user-defined shell script before performing the code download operation. Generally, this parameter can be used to set some environment variables and other pre-execution information. For details, please refer to https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runners-section.
If this shell script returns exit -1, the current job will be automatically stopped and marked as failed in the pipeline.

If our Gitlab Runner is using the shell mode, then we only need to adjust the following content in the configuration file of our Gitlab Runner Server:/etc/gitlab-runner/config.tomlAdjust the following content inside the parentheses:

[[runners]]
  name = "ubuntu"
  url = "https://gitlab.xxx.com/"
  token = "AUt-sfU1xxxxxx"
  executor = "shell"
  pre_clone_script="echo pre_clone_script && pwd"
  pre_build_script="echo pre_build_script_test && pwd"
  [runners.custom_build_dir]
  [runners.cache]
    [runners.cache.s3]
    [runners.cache.gcs]
    [runners.cache.azure]

The content we actually added is:

pre_clone_script="echo pre_clone_script && pwd"
pre_build_script="echo pre_build_script_test && pwd"

Before Gitlab Runner executes the task, it will first executeecho pre_clone_script && pwdScript, and then execute the Job content.

After we configure it, let's try submitting the code.

All tasks ran correctly, let's take a look at the task logs:

The shell script we added to the Gitlab Runner configuration file was executed, but no configuration was made for the project itself.

Up to this point, we have completed the scheme of controlling project code on the Gitlab Runner end, simply replace the test shell script with a code security scan shell script.

For example, if we write a scriptseccheck.sh:

echo "Start security scan"

target_agent_path="/tmp/sec_agent"
agent_api="https://xxx.com/gitlab/sec_agent" # Remote address of the security agent

# If the Runner is in docker or k8s mode, this method of remote downloading the agent and then executing it can be used; if it is in shell mode, the agent can be uploaded directly without the need for this method
{ download_error=$(wget --tries=2 --timeout=10 --quiet -O $target_agent_path $agent_api 2>&1 >&3 3>&-); } 3>&1 || {
  exit 0
}

chmod +x /tmp/sec_agent

# Send git project data to the agent, and then the agent uses the sast engine's API to detect and return the results, judge whether there is a bottleneck, if errcode==255, the process will be blocked
{ security_agent_errors=$(/tmp/sec_agent --gitUrl "${CI_PROJECT_URL}.git" --gitCommitId "$CI_COMMIT_SHA" --gitBranch "${MULT_COMMIT_BRANCH}" --url "${CI_PIPELINE_URL}" --users "${GITLAB_USER_LOGIN}" --gitProjectPath "${CI_PROJECT_PATH}" --pipelineId "${CI_PIPELINE_ID}" --pipelineName "${CI_PROJECT_PATH}" --ciJobName "${CI_JOB_NAME}" 2>&1 >&3 3>&-); } 3>&1 || {
  if [[ $? == 255 ]]; then
    exit -1 # Block
  else
    echo "failed security scan"
  fi
}
echo "Finish security scan"

Then add the following to the Gitlab Runner configuration:

pre_clone_script="path/seccheck.sh"

This has achieved our ultimate goal.

Remaining issues resolved

So far, we have basically completed the mandatory detection and checkpoint functions for Gitlab projects. The final method we use is the Gitlab Runner'spre_clone_scriptConfiguration.

However, there is a problem with this configuration, that is, each Job will be called before execution.

This repeated call is obviously not necessary. What should we do if we expect that subsequent Jobs will not perform security scanning after the first Job completes the security scan?

We can add a scheduling proxy node before the Job and security scan to achieve the following function: first, use the Gitlab Restful API to get the list of all Jobs in the current Pipeline, and judge whether it is the first Job (Job1). If not, do not perform security scanning.

This way, we have completely solved the problem of multiple security scans on the same pipeline.

If you are using the Gitlab Runner mode as k8s instead of shell, then you can useRUNNER_PRE_CLONE_SCRIPTinsteadpre_clone_scriptConfiguration.

To conclude

In response to the issue of code security scanning for projects using Gitlab CI, the above methods are proposed step by step.

In fact, there is no inherent superiority or inferiority among the above methods. It mainly depends on the specific business scenario, for example, if the number of projects is not large, the most basic access method is also fine; if the project volume is very large and security checkpoints are required, the method based on Gitlab Runner is definitely the best.

Author: l4yn3@Xiaomi Security
The author focuses on application security testing, code auditing, and the design, research and integration of devsecops toolchains.

你可能想看：

Ensure that the ID can be accessed even if it is guessed or cannot be tampered with; the scenario is common in resource convenience and unauthorized vulnerability scenarios. I have found many vulnerab

Announcement regarding the addition of 7 units as technical support units for the Ministry of Industry and Information Technology's mobile Internet APP product security vulnerability database

In today's rapidly developing digital economy, data has become an important engine driving social progress and enterprise development. From being initially regarded as part of intangible assets to now

d) Adopt identification technologies such as passwords, password technologies, biometric technologies, and combinations of two or more to identify users, and at least one identification technology sho

It is possible to perform credible verification on the system boot program, system program, important configuration parameters, and application programs of computing devices based on a credible root,

Data security can be said to be a hot topic in recent years, especially with the rapid development of information security technologies such as big data and artificial intelligence, the situation of d

How to use truffleHog to search for high-entropy strings and sensitive data in Git repositories to protect the security of code repositories

Enterprises have insufficient response capabilities, and the cybersecurity of the manufacturing industry faces severe challenges

Case of cyber security planning project for a financial institution under the background of data security and security compliance

As announced today, Glupteba is a multi-component botnet targeting Windows computers. Google has taken action to disrupt the operation of Glupteba, and we believe this action will have a significant i

最后修改时间：2025-03-29 08:58:40