First, the problem
Recently, our operations team received a memory alarm from the online LB (load balancing) service. The operations team reported that some machines in the LB cluster had memory usage rates exceeding 80%, and some even exceeded 90%. Moreover, the memory usage rate continues to increase. Upon receiving the memory alarm, the entire team became quite concerned. Our team is responsible for the LB service, which is the traffic entry for retail, logistics, technology, and other business services, handling the traffic forwarding of tens of thousands of services. Any fault affecting business services would have a significant impact, so we must immediately address the issue of the rapidly increasing memory usage. Currently, it is just a memory alarm and does not affect business operations. We have taken down the LB services with memory usage rates above 90% to prevent the memory from becoming too high and causing the LB service to crash, which would affect business operations. The operations team is closely monitoring related memory alarm messages.
Second, the troubleshooting process
After the development team checked the kernel memory leak of Slab through cat /proc/meminfo.

$ cat/proc/meminfo
MemTotal: 65922868kB
MemFree: 9001452kB
...
Slab: 39242216kB
SReclaimable: 38506072kB
SUnreclaim: 736144kB
....
Through slabtop command analysis of slab, it is found that the dentry object occupies a high proportion in the kernel. Considering that dentry objects are related to files, everything can be a file in Linux, this may be related to socket files. Through further investigation, it is found that there is a curl sending HTTPS detection script on the LB service, this script has a dentry object leak, and it is found on the curl forumAn articleConfirmed this issue, this article explains that there is a dentry leak bug in the NSS library that curl-7.19.7 version depends on when sending HTTPS requests. I checked and our curl version is 7.19.7, and the truth of the problem is finally clarified!!!
$ curl -V
curl 7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2
Protocols: tftp ftp telnet dict ldap ldaps http file https ftps scp sftp
Features: GSS-Negotiate IDN IPv6 Largefile NTLM SSL libz
$ rpm -aq|grep nss-
nss-util-3.16.1-3.el6.x86_64
nss-sysinit-3.16.1-14.el6.x86_64
nss-softokn-freebl-3.14.3-17.el6.x86_64
nss-softokn-3.14.3-17.el6.x86_64
nss-3.16.1-14.el6.x86_64
nss-tools-3.16.1-14.el6.x86_64
The article introduces that setting the environment variable NSS_SDB_USE_CACHE can fix this bug, and we have verified this solution.
3. Solution
1. Currently, stop the detection script first, and clean up the cache of services with memory usage over 90% during the low-traffic period by using drop_caches.
2. After the big promotion, set the environment variable NSS_SDB_USE_CACHE in the detection script to thoroughly fix this problem.
4. Review and Summary
The root cause of this memory surge problem is a bug in the NSS library that curl-7.19.7 depends on, which causes dentry leaks. The detection script merely exposes this issue.This issue was caused by a Linux memory leak, so it is very necessary to systematically study the knowledge of Linux memory management again, which will be very helpful for us to troubleshoot memory overflow issues in the future.
1) Linux memory addressing
The Linux kernel mainly manages the address space of the virtual memory management process, and both kernel processes and user processes only allocate virtual memory and do not allocate physical memory. Through memory addressing, virtual memory is mapped to physical memory. There are three types of addresses in the Linux kernel:
a, Logical address, each logical address consists of a segment and an offset, where the offset indicates the distance from the beginning of the segment to the actual address.
b, Linear address, also known as virtual address, is a 32-bit unsigned integer, and the memory of a 32-bit machine can reach up to 4GB. It is usually represented in hexadecimal digits, and the memory of Linux processes generally refers to this memory.
c, Physical address, used for addressing memory chip-level memory units. They correspond to the electrical signals sent from the CPU's address pins to the memory bus.
The memory management unit (MMU) in Linux converts a logical address to a linear address through a hardware circuit called the segmentation unit, and then, the second hardware circuit called the paging unit converts the linear address to a physical address.
2) Linux paging mechanism
The paging unit converts linear addresses to physical addresses. The linear address is divided into groups of fixed length, known aspages(page). The continuous linear addresses within a page are mapped to continuous physical addresses. Generally, 'page' refers to both a set of linear addresses and the data contained in this set of addresses. The paging unit divides all the RAM into fixed-lengthPage frame(page frame), also known as a physical page. Each page frame contains a page (page), which means that the length of a page frame is consistent with the length of a page. The page frame is part of main memory, and therefore is also a storage area. It is important to distinguish between a page and a page frame, as the former is just a data block that can store any page frame or disk.Page table(page table). The page table is stored in main memory and must be properly initialized by the kernel before the paging unit is enabled.
The x86_64 Linux kernel uses a 4-level paging model, generally with each page being 4K, and 4 types of page tables:
a, Page global directory
b, Page upper-level directory
c, Page intermediate directory
d, Page table
The page global directory includes several page upper-level directories, and the page upper-level directories in turn include the addresses of several page intermediate directories, and the page intermediate directories contain the addresses of several page tables. Each page table item points to a page frame. The linear address is divided into 5 parts.
3) NUMA architecture
With the entry of the multi-core era, multi-core CPUs access memory through a data bus, resulting in a large access delay, thereforeNUMAThe architecture has emerged, and NUMA architecture is the full name of Non-Uniform Memory Architecture (NUMA). The physical memory of the system is divided into several nodes (node), each node binds different CPU cores, and the local CPU cores directly access the local memory node, with the minimum delay.
Can belscpu commandView the relationship between NUMA and CPU cores.
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32On-line CPU(s)list: 0-31
Thread(s)per core: 2Core(s)per socket: 8Socket(s): 2NUMA node(s): 2Vendor ID: GenuineIntel
CPU family: 6Model: 62Stepping: 4CPU MHz: 2001.000BogoMIPS: 3999.43Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K
NUMA node0 CPU(s): 0-7,16-23 # These cores are bound to NUMA 0NUMA node1 CPU(s): 8-15,24-31 # These cores are bound to NUMA 1
4) Buddy relationship algorithm
The Linux kernel establishes a robust and stable memory allocation strategy through the famous buddy relationship algorithm to allocate a group of consecutive page frames, which is a type of memory allocator in the kernel and solves the problem of external fragmentation in memory management. External fragmentation refers to frequentThe ground request and release of a set of continuous page frames of different sizes must inevitably lead to many small free page frames scattered in the blocks of allocated page frames.
5) Slab mechanism
slabThe core idea of the mechanism is to manage memory from the perspective of objects, mainly to solve internal fragmentation,Internal fragmentation is caused by using fixed-size memory partitions, i.e., allocating in units of fixed-size blocks. With this method, the memory allocated by the process may be larger than needed, and the extra part is internal fragmentation. Slab is also a type of memory allocator in the kernel, the slab allocatorManaged based on objects, where the so-called objects are data structures in the kernel (e.g., task_struct, file_struct, etc.). Objects of the same type are grouped together. Whenever an application for such an object is made, the slab allocator allocates a unit of such size from a slab list, and when it is released, it is returned to the list instead of directly returned to the buddy system, thus avoiding internal fragmentation. The dentry object mentioned above is an object allocated by the slab allocator.
The slab and buddy system have an hierarchical calling relationship, where the buddy relationship manages memory by pages, and the slab manages by bytes. The slab first acquires several pages of memory from the buddy system, then cuts them into fixed small blocks (called objects), and then allocates objects according to the declared object data structure.
6) Process memory distribution
All processes must occupy a certain amount of memory, which is used to store program code loaded from the disk or data from user input, etc. Memory can be statically allocated and uniformly recycled in advance, or dynamically allocated and recycled as needed. For the memory space of a common process, it contains five different data areas:
a. Code segment (text): The mapping of program code in memory, which stores the binary code of function bodies and is usually used to store program execution code (i.e., machine instructions executed by the CPU).
b. Data segment (data): Stores globally initialized variables and static local variables with non-zero initial values in the program. The data segment belongs to static memory allocation (static storage area) and is readable and writable.
c.BSS segment (bss):Uninitialized global variables and static local variables.
d. Heap (heap): Dynamically allocated memory segments with variable sizes, which can be dynamically expanded (e.g., by malloc functions) or dynamically reduced (e.g., by free functions).
e. Stack (stack): Stores temporary locally created variables.
The Linux kernel is the highest priority in the operating system. The kernel function must allocate appropriate memory in a timely manner when applying for memory, and the memory application by the user-space process is considered not urgent. The kernel tries to delay the dynamic allocation of memory to user-space processes.
a. Request for page swapping, delay until the page to be accessed by the process is not in RAM, and trigger a page fault exception.
b、Copy on Write (COW), parent and child processes share page frames instead of copying page frames, but the shared page frames cannot be modified. Only when the parent/child process tries to write to the shared page frame, the kernel will copy a new page frame and mark it as writable.
7) Linux memory detection tool
a、free commandCan monitor system memory
$ free-h
total used freeshared buff/cache available
Mem: 31Gi 13Gi 8.0Gi 747Mi 10Gi 16Gi
Swap: 2.0Gi 321Mi 1.7Gi
b、View system memory and process memory with the top command
•VIRT
Virtual Memory Size (KiB): All the virtual memory used by the process, including code (code), data (data), shared libraries, and the parts swapped out (swap out) to the swap area and mapped (map) but not yet used (not loaded into physical memory).
•RES
Resident Memory Size (KiB): All the physical memory occupied by the process, excluding the part swapped out to the swap area.
•SHR
Shared Memory Size (KiB): All shared memory that the process can read, not all parts are included in RES
It reflects the memory parts that may be shared by other processes.
c、smaps file
cat /proc/$pid/smapsView the distribution of a process's virtual memory space
0082f000-00852000 rw-p 0022f000 08:05 4326085/usr/bin/nginx/sbin/nginx
Size: 140kB
Rss: 140kB
Pss: 78kB
Shared_Clean: 56kB
Shared_Dirty: 68kB
Private_Clean: 4kB
Private_Dirty: 12kB
Referenced: 120kB
匿名: 80kB
匿名大页: 0kB
交换: 0kB
KernelPageSize: 4kB
MMUPageSize: 4kB
d、vmstat
vmstat是Virtual Memory Statistics(虚拟内存统计)的缩写,可以实时动态监视操作系统的虚拟内存、进程、CPU活动。
## 每秒统计3次$ vmstat13procs -----------memory---------------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd freebuff cache si so bi bo incs us sy idwa st
00023348384075830420795596000100001000000023348393675830420795596000010521569001000000023348392075830420795596000096615580010000
e、meminfo文件
在Linux系统中/proc/meminfo此文件用于记录系统内存使用的详细信息。
$ cat/proc/meminfo
总内存: 8052444kB
空闲内存: 2754588kB
可用内存: 3934252kB
缓冲区: 137128kB
缓存: 1948128kB
交换缓存: 0kB
活动: 3650920kB
不活动: 1343420kB
活动(匿名): 2913304kB
不活动(匿名): 727808kB
活动(文件): 737616kB
不活动(文件): 615612kB
不可撤销: 196kB
Mlocked: 196kB
SwapTotal: 8265724kB
SwapFree: 8265724kB
Dirty: 104kB
Writeback: 0kB
AnonPages: 2909332kB
Mapped: 815524kB
Shmem: 732032kB
Slab: 153096kB
SReclaimable: 99684kB
SUnreclaim: 53412kB
KernelStack: 14288kB
PageTables: 62192kB
NFS_Unstable: 0kB
Bounce: 0kB
WritebackTmp: 0kB
CommitLimit: 12291944kB
Committed_AS: 11398920kB
VmallocTotal: 34359738367kB
VmallocUsed: 0kB
VmallocChunk: 0kB
HardwareCorrupted: 0kB
AnonHugePages: 1380352kB
CmaTotal: 0kB
CmaFree: 0kB
HugePages_Total: 0HugePages_Free: 0HugePages_Rsvd: 0HugePages_Surp: 0Hugepagesize: 2048kB
DirectMap4k: 201472kB
DirectMap2M: 5967872kB
DirectMap1G: 3145728kB
Some contents in the summary section are from "Understanding the Linux Kernel" and some are written based on personal understanding. Corrections are welcome for any inaccuracies, and some images are from the internet
Author: Li Zunju

评论已关闭