pdf
|<<
<
>
>>|
/
The Way of {{#i|Optimizing}} and {{#i|Troubleshooting}} Our {{#x|Lua Waf}} ☺{{#author|agentzh@gmail.com}}☺ {{#author|Yichun Zhang (agentzh)}} {{#date|2013.04.19}} ---- {{#x|♡}} Dane now {{#ci|loves}} to open a conversation with me via a {{#x|Flame Graph}}. ---- {{img src="images/flamegraph-myday.png" width="900" height="250"}} ---- {{#x|♡}} One day, Dane walks to me with a {{#ci|weird}} C-level {{#x|Flame Graph}} for our online WAF... ---- {{img src="images/nginx-waf-20130405.png" width="885" height="450"}} ---- {{#x|♡}} About 80% of the C-level {{#x|backtraces}} are {{#ci|bad}} in the graph. ---- {{#x|♡}} Fortunately we have a good {{#ci|Lua-land}} {{#x|Flame Graph}} for the same process. ---- {{img src="images/nginx-waf-lua-20130405.png" width="977" height="210"}} ---- {{#x|♡}} We are using the PCRE {{#x|Just-In-Time}} compiler. JITted code does {{#ci|not}} have debug information required by a proper C-level backtrace. ---- {{#x|♡}} We now know {{#ci|PCRE}} is busy! We now know our {{#ci|regexes}} are slow! ---- {{#x|♡}} {{#c|ngx-pcre-stats}} is a simple tool based on {{#x|systemtap}}. ---- {{#v|$}} ./ngx-pcre-stats -p 24528 {{#c|--exec-time-dist}} Tracing 24528 (/path/to/nginx/sbin/nginx)... Hit Ctrl-C to end. ^C Logarithmic histogram for pcre_exec running time distribution (us): value |-------------------------------------------------- count 0 | 0 1 | 0 2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 981 4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1479 8 | 16 16 | 18 32 | 1 ---- {{#v|$}} ./ngx-pcre-stats -p 24528 {{#c|--total-time-top}} --luajit20 Tracing 24528 (/path/to/nginx/sbin/nginx)... Hit Ctrl-C to end. ^C Top N regexes with {{#x|longest total running time}}: 1. pattern /WEB_ATTACK/: 15103us (total data size: 82184) 2. pattern / __cf__\d+/: 11143us (total data size: 25916) 3. pattern /[^\x01-\xff]/: 10233us (total data size: 102825) 4. pattern /\b(?:coalesce\b|root\@)/: 7017us (total data size: 78230) 5. pattern /(Content-Length|Transfer-Encoding)/: 6766us (total data size: 17871) ... ---- {{#v|$}} ./ngx-pcre-stats -p 24528 {{#c|--worst-time-top}} --luajit20 Tracing 24528 (/path/to/nginx/sbin/nginx)... Hit Ctrl-C to end. ^C Top N regexes with {{#x|worst running time}}: 1. pattern /\.cookie\b.*?\;\W*?domain\W*?\=/: 98us (data size: 36) 2. pattern /(Content-Length|Transfer-Encoding)/: 89us (data size: 14) 3. pattern / __cf__\d+/: 63us (data size: 8) 4. pattern /[^\x01-\xff]/: 53us (data size: 13) 5. pattern /\b(background|dynsrc|href|lowsrc|src)\b\W*?=/: 53us (data size: 5147) 6. pattern /(?i:<embed[ /+\t].*?SRC.*?=)/: 47us (data size: 304) 7. pattern /(fromcharcode|alert|eval)\s*\(/: 45us (data size: 24) 8. pattern /\bselect\b.*?\bto_number\b/: 40us (data size: 5147) ---- {{#v|John Graham-Cumming}}: This is very {{#ci|helpful}}. {{#v|John Graham-Cumming}}: And today I used it to {{#x|kill off}} four of them. ---- {{#x|♡}} Another day, Dane comes to me with another {{#x|Flame Graph}} and tells me that some Nginx worker processes are {{#ci|hot spinning}}! ---- {{img src="images/nginx-waf-20130417.png" width="901" height="350"}} ---- {{#x|♡}} The Lua code seems to be {{#x|spining}} around some {{#ci|string.find()}} calls. ---- {{#x|♡}} Let's find out the {{#ci|Lua}}-land {{#x|backtrace}}! ---- {{#x|♡}} After GDB {{#ci|crashes}} our processes for 6 times, I quickly write another {{#c|tool}} named ngx-lua-bt based on {{#x|systemtap}}... ---- {{#v|$}} {{#x|./ngx-lua-bt}} -p 7599 --luajit20 WARNING: Tracing 7599 (/path/to/nginx/sbin/nginx) for LuaJIT 2.0... C:lj_cf_string_find @/usr/local/nginx-waf/lua/waf-core.lua:201 @/usr/local/nginx-waf/lua/waf-core.lua:676 @/usr/local/nginx-waf/lua/waf-core.lua:1467 @/usr/local/nginx-waf/lua/waf-core.lua:1074 @/usr/local/nginx-waf/lua/rules/oldwaf/compiled.lua:371 @/usr/local/nginx-waf/lua/waf.lua:57 C:lj_ffh_pcall @/usr/local/nginx-waf/lua/waf.lua:50 access_by_lua:1 ---- {{#x|♡}} The problematic {{#ci|infinite}} loop in John Graham-Cumming's {{#x|Lua code}}... ---- {{#kw|while}} start {{#kw|do}} {{#kw|local}} stop = string.find(v, {{#x|"}"}}, start, {{#kw|true}}) {{#kw|if}} stop {{#kw|then}} ... {{#kw|end}} {{#kw|end}} ---- {{#v|John Graham-Cumming}}: Well, that's the {{#ci|perfect}} bug report. Even contains the {{#x|fix}}! ---- ☺ {{#ci|Any questions}}? ☺