Author: JD Health, Zhang Na
1. The significance and challenges of concurrent programming
The significance of concurrent programming is to fully utilize each core of the processor to achieve the highest processing performance, allowing the program to run faster. And to improve the calculation speed, the processor also makes a series of optimizations, such as:
1. Hardware upgrade: To balance the magnitude of the speed difference between the high-speed storage inside the CPU and the main memory, and to improve overall performance, the traditional hardware memory architecture with multi-level high-speed caches is introduced to solve the problem, which is that data exists simultaneously in the high-speed cache and the main memory, and the cache consistency problem needs to be solved.
2. Processor optimization: mainly includes compiler reordering, instruction-level reordering, and memory system reordering. By reducing the execution instructions through three levels of reordering: single-threaded semantics, instruction-level parallel overlap execution, and cache loading and storage, the overall running speed is improved. The problem is that in a multi-threaded environment, compilers and CPU instructions cannot recognize the data dependencies between multiple threads, which affects the execution results of the program.
The benefits of concurrent programming are enormous, but to write code that is both thread-safe and efficient in execution, it is necessary to manageMutable shared stateThe operation access to the operation of the same object in a multi-threaded environment may result in value changes and unsynchronized values, and the results obtained may be vastly different from the theoretical values. In this case, the object is not thread-safe. When multiple threads access a piece of data, regardless of the scheduling method used by the runtime environment or how these threads alternate in execution, the computational logic always exhibits correct behavior, and the object is then considered thread-safe. Therefore, ensuring thread safety in concurrent programming is an easily overlooked issue and also a significant challenge.
Therefore, why there are thread safety issues, first of all, we must understand two key issues:
1. How threads communicate, that is, what mechanism threads use to exchange information.
2. How threads are synchronized, that is, how the program controls the occurrence order of different threads.
Second, Java Concurrency Programming
Java uses a shared memory model for concurrency, and communication between Java threads is always implicit, and the entire communication process is completely transparent to programmers.
2.1 Java Memory Model
To balance the relationship between programmers' desire for high memory visibility (more constraints on compilers and processors) and high computational performance (as few constraints on compilers and processors as possible), JAVA definesJava Memory Model (Java Memory Model, JMM)It is agreed that as long as the program execution result is not changed, compilers and processors can optimize as they wish. Therefore, the main problem solved by the JMM is to provide memory visibility guarantees through the establishment of communication specifications between threads.
The structure of the JMM is shown as follows:
In this regard, local variables created within a thread, method definition parameters, etc., are only used within the thread and do not have concurrency issues. For shared variables, the JMM specifies how and when a thread can see the value of a shared variable that has been modified by another thread, as well as how to synchronize access to the shared variable when necessary.
To control the interaction between the working memory and the main memory, the following specifications are defined:
・All variables are stored in the main memory (Main Memory).
・Each thread has a private local memory (Local Memory), which stores the copy of the shared variables read / written by the thread.
・All operations on variables must be performed in the local memory of a thread, and cannot be directly read or written to the main memory.
・Threads cannot directly access variables in each other's local memory.
In terms of specific implementation, eight types of operations are defined:
1.lock: Affects the main memory, marking a variable as an exclusive state for a thread.
2.unlock: Affects the main memory, releasing the exclusive state.
3.read: Affects the main memory, transferring the value of a variable from the main memory to the working memory of a thread.
4.load: Affects the working memory, placing the variable value passed by the read operation into the copy of the variable in the working memory.
5.use: Affects the working memory, passing the value of a variable in the working memory to the execution engine.
6.assign: Affects the working memory, assigning a value received from the execution engine to a variable in the working memory.
7.store: Affects the variable in the working memory, transferring the value of a variable in the working memory to the main memory.
8.write: Affects the variable in the main memory, placing the value of the variable passed by the store operation into the variable in the main memory.
These operations all meet the following principles:
• It is not allowed for either read and load or store and write operations to appear alone.
• Before performing an unlock operation on a variable, it must first be synchronized to the main memory (perform store and write operations).
2.2 Concurrency Keywords in Java
Based on the above rules, Java provides keywords such as volatile and synchronized to ensure thread safety, the basic principle of which is to solve concurrency problems from two aspects: restricting processor optimization and using memory barriers. If it is a variable level, use volatile to declare any type of variable, which has atomicity like basic data type variables and reference type variables; if the application scenario requires a larger range of atomicity guarantees, it is necessary to use synchronization block technology. The Java memory model provides lock and unlock operations to meet this need. The virtual machine provides byte code instructions monitorenter and monitorexit to implicitly use these operations, and these byte code instructions are reflected in Java code as synchronized blocks - the synchronized keyword.
The role of these two words is that volatile only guarantees atomicity of read/write operations on a single volatile variable, while the exclusive execution characteristics of locks can ensure the atomicity of the execution of the entire critical section code. Functionally, locks are more powerful than volatile, and volatile has more advantages in scalability and execution performance.
2.3 Concurrency Containers and Utility Classes in Java
2.3.1 CopyOnWriteArrayList
CopyOnWriteArrayList adds a reentrant lock to ensure thread safety during element operations, but each addition or deletion of elements requires copying a new array, which is a significant waste of space.
public E get(int index) {
return get(getArray(), index);
{}
public boolean add(E e) {
final ReentrantLock lock = this.lock;
lock.lock();
try {
Object[] elements = getArray();
int len = elements.length;
Object[] newElements = Arrays.copyOf(elements, len + 1);
newElements[len] = e;
setArray(newElements);
return true;
}
lock.unlock();
{}
{}
2.3.2 Collections.synchronizedList(new ArrayList<>());
This method adds a layer of synchronize synchronization control to the List operations. It should be noted that overall synchronization control needs to be manually done while traversing the List.
public void add(int index, E element) {
// SynchronizedList is a List operation that adds a layer of synchronize synchronization control
synchronized (mutex) {list.add(index, element);}
{}
public E remove(int index) {
synchronized (mutex) {return list.remove(index);}
{}
2.3.3 ConcurrentLinkedQueue
Adding nodes to the queue non-blocking through loop CAS operations,
public boolean offer(E e) {
checkNotNull(e);
final Node<E> newNode = new Node<E>(e);
for (Node<E> t = tail, p = t;;) {
Node<E> q = p.next;
if (q == null) {
// p is the tail node, CAS sets p's next to newNode.
if (p.casNext(null, newNode)) {
if (p != t)
// tail points to the actual tail node
casTail(t, newNode);
return true;
{}
{}
else if (p == q)
// It indicates that both p node and p's next node are null, which means the queue has just been initialized, is preparing to add a node, and therefore returns the head node
p = (t != (t = tail)) ? t : head;
else
// Find the tail node backward
p = (p != t && t != (t = tail)) ? t : q;
{}
{}
Third, online cases
3.1 Problem discovery
In the internet hospital doctor's end, when the doctor opens the consultation IM chat page, it is necessary to load dozens of functional buttons. During the COVID-19 pandemic in December 2022, the QPS was very high all day, reaching 12 times the peak of an ordinary day, and occasionally an alarm提示 that the buttons were not displayed completely, with a probability of occurrence of about one in a million.
3.2 Detailed process of troubleshooting
The loading of the doctor consultation IM page belongs to the business golden process, and each button on it is the entrance of a business line. Therefore, all alarms on the core logic use custom alarms, and this type of alarm does not set convergence. Any kind of exception, including the abnormal number of buttons, will trigger an alarm immediately.
1. Based on the alarm information, start the investigation, but found the following problems:
(1)No exception logs: Following the logId of the exception log, it turned out that there were no exception logs during the process, and the buttons became fewer for no apparent reason.
(2)Cannot be reproduced: In the pre-release environment, using the same parameters, the interface returns normally, and it is impossible to reproduce.
2. Code analysis, narrow the scope of exceptions:
The processing of the IM button grouping for doctor consultation is carried out:
// Collection of results from multiple threads
List<DoctorDiagImButtonInfoDTO> multiButtonList = new ArrayList<>();
// Multi-threaded parallel processing
Future<List<DoctorDiagImButtonInfoDTO>> multiButtonFuture = joyThreadPoolTaskExecutor.submit(() -> {
List<DoctorDiagImButtonInfoDTO> multiButtonListTemp = new ArrayList<>();
buttonTypes.forEach(buttonType -> {
multiButtonListTemp.add(appButtonInfoMap.get(buttonType));
);
multiButtonList.addAll(multiButtonListTemp);
return multiButtonListTemp;
);
3. Increase log online observation
Due to the fact that concurrent scenarios are prone to cause failures in subthreads, necessary node logs are added to each subthread branch for observation after going online:
(1) During the processing of the request that occurred an exception, all subthreads were processed normally.
(2) The number of missing buttons is randomly equal to the number of buttons processed in the subthread
(3) Preliminary judgment is that the ArrayList concurrent addAll operation is abnormal
4. Simulation and Reproduction
Simulate and reproduce the problem using the ArrayList source code:
(1) Source Code Analysis of ArrayList:
public boolean addAll(Collection<? extends E> c) {
Object[] a = c.toArray();
int numNew = a.length;
ensureCapacityInternal(size + numNew); // Increments modCount
// Starting from the current size, append the newly added objects to the array
System.arraycopy(a, 0, elementData, size, numNew);
// Update the value of the global variable size, and the previous step is not an atomic operation, which is the root cause of concurrent issues
size += numNew;
return numNew != 0;
{}
private void ensureCapacityInternal(int minCapacity) {
if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
{}
ensureExplicitCapacity(minCapacity);
{}
private void ensureExplicitCapacity(int minCapacity) {
modCount++;
// overflow-conscious code
if (minCapacity - elementData.length > 0)
grow(minCapacity);
{}
private void grow(int minCapacity) {
// overflow-conscious code
int oldCapacity = elementData.length;
int newCapacity = oldCapacity + (oldCapacity >> 1);
if (newCapacity - minCapacity < 0)
newCapacity = minCapacity;
if (newCapacity - MAX_ARRAY_SIZE > 0)
newCapacity = hugeCapacity(minCapacity);
// minCapacity is usually close to size, so this is a win:
elementData = Arrays.copyOf(elementData, newCapacity);
{}
(2) Theoretical Analysis
In the add operation of ArrayList, changing the size and adding data operations are not atomic operations.
(3) Problem reproduction
Copy source code to create a custom class, adding pauses for easy reproduction of concurrent issues
public boolean addAll(Collection<? extends E> c) {
Object[] a = c.toArray();
int numNew = a.length;
// The first pause, getting the current size
try {
Thread.sleep(1000*timeout1);
} catch (InterruptedException e) {
e.printStackTrace();
{}
ensureCapacityInternal(size + numNew); // Increments modCount
// The second pause, waiting for copy
try {
Thread.sleep(1000*timeout2);
} catch (InterruptedException e) {
e.printStackTrace();
{}
System.arraycopy(a, 0, elementData, size, numNew);
// The third pause, waiting for size+=
try {
Thread.sleep(1000*timeout3);
} catch (InterruptedException e) {
e.printStackTrace();
{}
size += numNew;
return numNew != 0;
{}
3.3 Problem solving
Create ArrayList using thread-safe utility Collections.synchronizedList:
List<DoctorDiagImButtonInfoDTO> multiButtonList = Collections.synchronizedList(new ArrayList<>());
Observation after going online is normal.
3.4 Summary and reflection
Using multi-threading to handle problems has become very common, but thread-safe classes must be used for objects accessed by multiple threads.
In addition, it is necessary to clarify several soul issues:
(1) The soul of JMM: Happens-before principle
(2) The soul of concurrent utility class: read / write of volatile variables and CAS

评论已关闭