How to use truffleHog to search for high-entropy strings and sensitive data in Git repositories to protect the security of code repositories

0 23
About truffleHogtruffleHog is a powerful data mining tool that can help research...

About truffleHog

truffleHog is a powerful data mining tool that can help researchers easily search for high-entropy strings and sensitive data in the target Git repository, and we can improve the security of our code repository based on this information. The tool can search for potential sensitive information by deeply analyzing the commit history and code branches of the target Git repository.

Operation Mechanism

This tool will traverse the entire commit history of each branch in the target Git repository, check each commit's Diff, and check for any sensitive data that may exist. This is determined by regular expressions and entropy. For entropy checks, truffleHog will evaluate the Shannon entropy of the base64 character set and hexadecimal character set of text blocks longer than 20 characters in each Diff. If a high-entropy string longer than 20 characters is detected at any time, it will print the relevant data to the screen.

Tool Installation

How to use truffleHog to search for high-entropy strings and sensitive data in Git repositories to protect the security of code repositories

This tool is developed based on Python, so researchers can use the pip command to complete the installation of the tool:

pip install truffleHog

Custom Configuration

We can add custom regular expressions using "--rules /path/to/rules", the method is to use a JSON file, the format of which is as follows:

{

"RSA private key": "-----BEGIN EC PRIVATE KEY-----"

{}
{

"local self signed test key": "-----BEGIN EC PRIVATE KEY-----\nfoobar123\n-----END EC PRIVATE KEY-----"

"git cherry pick SHAs": "regex:Cherry picked from .*"

{}

Note that previous versions of truffleHog ran entropy checks on git Diff. This feature is still present in the current version, but it has been enhanced with high-signal regular expression checks, and it also includes the ability to suppress entropy checks:

trufflehog --regex --entropy=False https://github.com/dxa4481/truffleHog.git

or

trufflehog file:///user/dxa4481/codeprojects/truffleHog/

With the help of the "--include_paths" and "--exclude_paths" options, we can also limit the scan to a subset of objects in the Git history by defining regular expressions in the file (one per line). The following is an example of a regular expression file for reference:

include-patterns.txt:

src/

# lines beginning with \

gradle/

# regexes must match the entire path, but can use python's regex syntax for

# case-insensitive matching and other advanced options

(?i).*\.(properties|conf|ini|txt|y(a)?ml)$

(.*\/)?id_[rd]sa$

exclude-patterns.txt:

(.*\/)?\.classpath$

.*\.jmx$

(.*\/)?test\/(.*\/)?resources\/

These filter files can be deployed and used with the following command:

trufflehog --include_paths include-patterns.txt --exclude_paths exclude-patterns.txt file://path/to/my/repo.git

With the help of these filters, the tool can discover and report issues in the root directory of the target Git repository. At the same time, we can also use the "-h" and "--help" commands to view more useful information.

Tool help information

usage: trufflehog [-h] [--json] [--regex] [--rules RULES] [--allow ALLOW]

[--entropy DO_ENTROPY] [--since_commit SINCE_COMMIT]

[--max_depth MAX_DEPTH]

git_url

 

Find secrets hidden in the depths of git.

 

positional arguments:

git_url   URL for secret searching

 

optional arguments:

-h, --help   show this help message and exit

--json   Output in JSON

--regex   Enable high signal regex checks

--rules RULES   Ignore default regexes and source from json list file

--allow ALLOW   Explicitly allow regexes from json list file

--entropy DO_ENTROPY   Enable entropy checks

--since_commit SINCE_COMMIT

Only scan from a given commit hash

--branch BRANCH   Scans only the selected branch

--max_depth MAX_DEPTH

The max commit depth to go back when searching for

secrets

-i INCLUDE_PATHS_FILE, --include_paths INCLUDE_PATHS_FILE

File with regular expressions (one per line), at least

one of which must match a Git object path in order for

it to be scanned; lines starting with "#" are treated

as comments and are ignored. If empty or not provided

(default), all Git object paths are included unless

otherwise excluded via the --exclude_paths option.

-x EXCLUDE_PATHS_FILE, --exclude_paths EXCLUDE_PATHS_FILE

File with regular expressions (one per line), none of

which may match a Git object path in order for it to

be scanned; lines starting with "#" are treated as

comments and are ignored. If empty or not provided

(default), no Git object paths are excluded unless

effectively excluded via the --include_paths option.

Combined with Docker usage

Firstly, we need to enter the directory containing the target Git repository:

cd /path/to/git

Then start truffleHog through the Docker image and run the following commands:

docker run --rm -v "$(pwd):/proj" dxa4481/trufflehog file:///proj

The "-v" option loads the current working directory (pwd) into the /proj directory of the Docker container.

"file:///proj" includes a reference to the "proj" directory within the container.

Tool usage example

Project address

truffleHog:GitHub Gateway

Reference materials

https://join.slack.com/t/trufflehog-community/shared_invite/zt-pw2qbi43-Aa86hkiimstfdKH9UCpPzQ

你可能想看:

In today's rapidly developing digital economy, data has become an important engine driving social progress and enterprise development. From being initially regarded as part of intangible assets to now

Data security can be said to be a hot topic in recent years, especially with the rapid development of information security technologies such as big data and artificial intelligence, the situation of d

d) Adopt identification technologies such as passwords, password technologies, biometric technologies, and combinations of two or more to identify users, and at least one identification technology sho

Enterprises have insufficient response capabilities, and the cybersecurity of the manufacturing industry faces severe challenges

Internal and external cultivation | Under the high-confrontation offensive and defensive, internal network security cannot be ignored

It is possible to perform credible verification on the system boot program, system program, important configuration parameters, and application programs of computing devices based on a credible root,

Distributed Storage Technology (Part 2): Analysis of the architecture, principles, characteristics, and advantages and disadvantages of wide-column storage and full-text search engines

As announced today, Glupteba is a multi-component botnet targeting Windows computers. Google has taken action to disrupt the operation of Glupteba, and we believe this action will have a significant i

b) It should have a login failure handling function, and should configure and enable measures such as ending the session, limiting the number of illegal login attempts, and automatically logging out w

Data security from the perspective of the red team in offensive and defensive exercises

最后修改时间:
admin
上一篇 2025年03月29日 23:44
下一篇 2025年03月30日 00:07

评论已关闭