performance_tunning.org
-->
* What I found in Performance Tuning
** BETTER MACHINE
- "we need more CPU core. we need more memory, currently 96G is not enough on our server. we need SSD"
** BETTER ALGORITHM
- using Array instead of HashMap
** CHOOSE PROPER INITIAL SIZE
- StringBuilder
- HashMap && ConcurrentHashMap && ArrayList ...
- YOUNG & OLD generation's size
+ GC in JAVA
GC for Eden Space:
GC for Old Generation:
+ HotSpot VM options:
-Xmn
-Xms256m -Xmx512m
-XX:NewRatio=n //Ratio of new/old generation sizes. The default value is 2.
-XX:SurvivorRatio=n //Ratio of eden/survivor spacesize. The default value is 8.
-XX:MaxTenuringThreshold=n
-XX:PermSize -XX:MaxPermSize
-XX:UseConcMarkSweepGC
-XX:ParallelGCThreads=n
+ If most of the application's data are short lived, you should expand size of YOUNG generation.
otherwise, expand the size of OLD generation.
find more GC options of HotSpot VM
+ About java GC GC Collectors
‐XX:+UseConcMarkSweepGC
+ SoftReference, WeakReference
+ Use case of PhantomReference
+ Apple has a new Compile-time tech to avoid performance problem caused by GC: Automatic Reference Counting
** LOW DOWN THE MEMORY CONSUMPTION
- String's memory structure in JVM
- structure of Object in JVM
+ a normal object requires 8 bytes of "housekeeping" space;
OpenJDK6: src/hotspot/src/share/vm/oops/markOop.hpp
// 32 bits:
// --------
// hash:25 ------------>| age:4 biased_lock:1 lock:2 (normal object)
// JavaThread*:23 epoch:2 age:4 biased_lock:1 lock:2 (biased object)
// size:32 ------------------------------------------>| (CMS free block)
// PromotedObject*:29 ---------->| promo_bits:3 ----->| (CMS promoted object)
+ arrays require 12 bytes (the same as a normal object, plus 4 bytes for the array length).
- structure of String Object
public final class String
8 |String's object header
4 |private final java.lang.Object value;--------->,
4 |private final int offset; |
4 |private final int count; |
4 |private int hash; |
|
8 |Char Array's object header |<-----------------'
4 |array length |
N*2 |bytes of N characters |
P |bytes of padding to 8n |
so, empty string will using 40 bytes.
In 64bits JVM, Object header will use 16bytes.
- string's substring implementation
- StringBuilder/StringBuffer use less memory.
-
- memory compress
- snapp-java
- Out of heap memory #find more
- BigMemory
- Story about not use AtomicInteger
- has better performance than synchronize adding (interlockedincrement __sync_fetch_and_add)
- use more memory
- IN THAT STORY: can be avoided through hash re-Dispatch
- AtomicReference
Java volatile reference vs. AtomicReference ( compareAndSet() used in Queue implementation)
** TOOLS
*** jps use jps to find correct jvm process Id
C:\Documents and Settings\Administrator>jps -lmVv
3672 org.nasa.marsrovers.Main -agentlib:jdwp=transport=dt_socket,suspend=y,address=localhost:4176 -Dfile.encoding=UTF-8 -Xbootclasspath:C:\Program Files\Java\jre6\lib\resources.jar;C:\Program Files\Java\jre6\lib\rt.jar;C:\Program Files\Java\jre6\lib\jsse.jar;C:\Program Files\Java\jre6\lib\jce.jar;C:\Program Files\Java\jre6\lib\charsets.jar
3380 sun.tools.jps.Jps -lmVv -Denv.class.path=.;C:\Program Files\Java\jdk1.6.0_22\jre\lib;C:\Program Files\Java\jdk1.6.0_22\lib; -Dapplication.home=C:\Program Files\Java\jdk1.6.0_22 -Xms8m
552 -Dosgi.requiredJavaVersion=1.5 -Xms40m -Xmx384m -XX:MaxPermSize=256m
*** jstack
+ look stack pattern find bug
We reproduced three days. If we grab a thread dump before restart server,
This bug will be super easy to find. same as .NET and C++.
before restart server, first thing we need to do is collect more
information (dump, memory usage, CPU usage)
$ jstack -l 3672
2012-08-12 22:48:17
Full thread dump Java HotSpot(TM) Client VM (17.1-b03 mixed mode):
"Low Memory Detector" daemon prio=6 tid=0x16bb2800 nid=0x1314 runnable [0x00000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
"CompilerThread0" daemon prio=10 tid=0x16baf400 nid=0xbd4 waiting on condition [0x00000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
"JDWP Command Reader" daemon prio=6 tid=0x16bad000 nid=0xdc4 runnable [0x00000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
"JDWP Event Helper Thread" daemon prio=6 tid=0x16bab000 nid=0x16c8 runnable [0x00000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
"JDWP Transport Listener: dt_socket" daemon prio=6 tid=0x16ba8c00 nid=0x11d4 runnable [0x00000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
"Attach Listener" daemon prio=10 tid=0x16b99000 nid=0x168c waiting on condition [0x00000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
"Signal Dispatcher" daemon prio=10 tid=0x16bb3400 nid=0x1320 runnable [0x00000000]
java.lang.Thread.State: RUNNABLE
Locked ownable synchronizers:
- None
"Finalizer" daemon prio=8 tid=0x16b85000 nid=0x1678 in Object.wait() [0x16cff000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x029d0b28> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(Unknown Source)
- locked <0x029d0b28> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(Unknown Source)
at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)
Locked ownable synchronizers:
- None
"Reference Handler" daemon prio=10 tid=0x16b83c00 nid=0x13ec in Object.wait() [0x16caf000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x029d0a28> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:485)
at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)
- locked <0x029d0a28> (a java.lang.ref.Reference$Lock)
Locked ownable synchronizers:
- None
"main" prio=6 tid=0x00847000 nid=0x1464 runnable [0x0091f000]
java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
- locked <0x029e19f0> (a java.io.BufferedInputStream)
at sun.nio.cs.StreamDecoder.readBytes(Unknown Source)
at sun.nio.cs.StreamDecoder.implRead(Unknown Source)
at sun.nio.cs.StreamDecoder.read(Unknown Source)
- locked <0x02b63a00> (a java.io.InputStreamReader)
at java.io.InputStreamReader.read(Unknown Source)
at java.io.BufferedReader.fill(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
- locked <0x02b63a00> (a java.io.InputStreamReader)
at java.io.BufferedReader.readLine(Unknown Source)
at org.nasa.marsrovers.simulator.Simulator.startUp(Simulator.java:53)
at org.nasa.marsrovers.Main.main(Main.java:50)
Locked ownable synchronizers:
- None
"VM Thread" prio=10 tid=0x16b81400 nid=0xef0 runnable
"VM Periodic Task Thread" prio=10 tid=0x16bc9000 nid=0x304 waiting on condition
JNI global references: 1508
*** jmap
**** show memory usage
08:57 ~ $ jmap -heap 623
Attaching to process ID 623, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 20.6-b01-414
using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC
Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 838860800 (800.0MB)
NewSize = 21757952 (20.75MB)
MaxNewSize = 174456832 (166.375MB)
OldSize = 65404928 (62.375MB)
NewRatio = 7
SurvivorRatio = 8
PermSize = 21757952 (20.75MB)
MaxPermSize = 367001600 (350.0MB)
Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 19595264 (18.6875MB)
used = 16274240 (15.52032470703125MB)
free = 3321024 (3.16717529296875MB)
83.0519047867893% used
Eden Space:
capacity = 17432576 (16.625MB)
used = 15573992 (14.852516174316406MB)
free = 1858584 (1.7724838256835938MB)
89.33844315378289% used
From Space:
capacity = 2162688 (2.0625MB)
used = 700248 (0.6678085327148438MB)
free = 1462440 (1.3946914672851562MB)
32.37859552556818% used
To Space:
capacity = 2162688 (2.0625MB)
used = 0 (0.0MB)
free = 2162688 (2.0625MB)
0.0% used
concurrent mark-sweep generation:
capacity = 154337280 (147.1875MB)
used = 120364648 (114.7886734008789MB)
free = 33972632 (32.398826599121094MB)
77.98805836153132% used
Perm Generation:
capacity = 176562176 (168.3828125MB)
used = 134844944 (128.59815979003906MB)
free = 41717232 (39.78465270996094MB)
76.37249781062961% used
**** show memory usage according object type
09:02 ~ $ jmap -histo 623 | head -n 20
num #instances #bytes class name
----------------------------------------------
1: 57975 34551008 [B
2: 304951 33637464 [C
3: 166951 23748152 <constMethodKlass>
4: 166951 22721672 <methodKlass>
5: 21247 21762832 <constantPoolKlass>
6: 279214 18444400 <symbolKlass>
7: 21247 17749960 <instanceKlassKlass>
8: 20092 13129152 <constantPoolCacheKlass>
9: 304840 9754880 java.lang.String
10: 53366 4497624 [Ljava.lang.Object;
11: 15038 4360504 [I
12: 122304 3913728 java.util.HashMap$Entry
13: 22426 2332304 java.lang.Class
14: 4146 2094728 <methodDataKlass>
15: 32534 1717344 [S
16: 12637 1676168 [Ljava.util.HashMap$Entry;
17: 35117 1544960 [[I
**** get a JVM heap dump
$ jmap -dump:file=/tmp/demo.map 91440
Dumping heap to /private/tmp/demo.map ...
Heap dump file created
or
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=C:/temp/oom.hprof
**** analyze heap dump
+ $ jhat /private/tmp/demo.map
Reading from /private/tmp/demo.map...
Started HTTP server on port 7000
+ Like !dumpheap -type Exception in windbg+sos, you can use OQL to find out
more useful stuff in the dumped heap, like:
select file.path.toString() from java.io.File file
**** using eclipse MAT http://www.eclipse.org/mat to analyze the dump file.
- articles about MAT
**** Monitor GC status of JVM
- jstat
$ jstat -gcutil 21891 250 7
S0 S1 E O P YGC YGCT FGC FGCT GCT
12.44 0.00 27.20 9.49 96.70 78 0.176 5 0.495 0.672
12.44 0.00 62.16 9.49 96.70 78 0.176 5 0.495 0.672
12.44 0.00 83.97 9.49 96.70 78 0.176 5 0.495 0.672
0.00 7.74 0.00 9.51 96.70 79 0.177 5 0.495 0.673
0.00 7.74 23.37 9.51 96.70 79 0.177 5 0.495 0.673
0.00 7.74 43.82 9.51 96.70 79 0.177 5 0.495 0.673
0.00 7.74 58.11 9.51 96.71 79 0.177 5 0.495 0.673
- visualVM
When CPU high, can use visualVM find out the most long run method.
When CPU low, can get some stack sample, see where is the block point.
queue.put? queue.take?
- nmon great tool to monitor CPU, memory, network, disks...
** OTHERS
*** JAVA AS SCRIPT
aim: write business calculation/logic to script
problem: java call Groovy/Python was so slow.
- write business calculation/logic to file XXX.java, then compile and load use customized class loader.
- can reload java class, after business calculation/logic changed, without restart Java instance.
- use script to trace data
*** NETWORK
-
For some machine with multi CPU, if not properly configured,
all network interfaces' interrupts will goto CPU0 (like following show).
Then network performance will be restricted by power of CPU0. see here for the solution.
$ cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
65: 20041 0 0 0 IR-PCI-MSI-edge eth0-tx-0
66: 20232 0 0 0 IR-PCI-MSI-edge eth0-tx-1
67: 20105 0 0 0 IR-PCI-MSI-edge eth0-tx-2
68: 20423 0 0 0 IR-PCI-MSI-edge eth0-tx-3
69: 21036 0 0 0 IR-PCI-MSI-edge eth0-rx-0
70: 20201 0 0 0 IR-PCI-MSI-edge eth0-rx-1
71: 20587 0 0 0 IR-PCI-MSI-edge eth0-rx-2
72: 20853 0 0 0 IR-PCI-MSI-edge eth0-rx-3
- set_thread_affinity
Restricting a process to run on a single CPU also avoids the performance cost
caused by the cache invalidation that occurs when a process ceases to execute
on one CPU and then recommences execution on a different CPU.
int sched_setaffinity(pid_t pid,size_t cpusetsize,cpu_set_t *mask);
- send several packages in a Big package.
- pfring
- Google Protocol Buffer
*** thread local storage to prevent new objects
JAVA: ThreadLocal
gcc: __thread int i;
- use array if possible
- use hash dispatch to avoid locking
- log sampling, if log is not important, then use queue.offer()
- keep thread count low.
- use message loop instead of timers
- less exception
- double buffer queue
** THE END
_______________________________________________________________________
_______________________________________________________________________
No comments:
Post a Comment