Introduction to our VDOM.pm & vdom-webkit cluster ---- Introduction to our {{#x|VDOM.pm}} & {{#x|vdom-webkit}} cluster ☺{{#author|agentzh@yahoo.cn}}☺ {{#author|章亦春 (agentzh)}} {{#date|2009.9}} ---- {{#v|VDOM}} ➥ {{#x|Visual}} DOM ➥ DOMs with {{#ci|vision information}} ---- {{#kw|window}} {{#kw|location}}={{#x|"http://foo.bar.com/index.html"}} {{#kw|innerHeight}}=802 {{#kw|innerWidth}}=929 {{#kw|outerHeight}}=943 {{#kw|outerWidth}}=1272 { {{#kw|document}} {{#kw|width}}=914 {{#kw|height}}=5119 { {{#v|...}} } } ---- {{#kw|BODY}} {{#kw|x}}=0 {{#kw|y}}=0 {{#kw|w}}=914 {{#kw|h}}=5119 {{#kw|fontFamily}}=\"Helvetica,Arial,sans-serif\" {{#kw|fontSize}}=\"12px\" {{#kw|fontStyle}}=\"normal\" {{#kw|fontWeight}}=\"400\" {{#kw|color}}=\"rgb(0, 0, 0)\" {{#kw|backgroundColor}}=\"rgb(255, 255, 255)\" { {{#x|"\\n "}} w=0 { } {{#kw|DIV}} {{#kw|id}}=\"append_parent\" {{#kw|x}}=0 {{#kw|y}}=0 {{#kw|h}}=0 {{#kw|backgroundColor}}=\"transparent\" { {{#x|"首页\\n\\n"}} {{#kw|x}}=1 {{#kw|y}}=1 { {{#v|...}} } } {{#x|"\\n "}} w=0 { } } ---- {{#kw|FONT}} {{#kw|color}}=\"rgb(255, 0, 0)\" { {{#kw|B}} {{#kw|fontWeight}}=\"401\" { {{#x|"购物"}} {{#kw|h}}=32 {{#kw|w}}=56 { } } } ---- \"Why {{#ci|another}} language?\" \"Why {{#x|not}} just borrow HTML or XML's syntax?\" ---- {{#cm|✓}} We want to keep VDOM dump size {{#ci|small}}. {{#cm|✓}} We want to keep VDOM dump {{#ci|unambiguous}}. {{#cm|✓}} We want to make VDOM more {{#x|human-readable}} and more {{#x|human-writable}}. (Yeah, XML/HTML's syntax is very {{#i|cumbersome}}.) {{#cm|✓}} We want to make VDOM {{#i|parsers}} & {{#i|dumper}} {{#ci|trivial}} to implement and verify. (tens of lines of Perl for example ;)) {{#cm|✓}} Low level structures like {{#ci|text runs}} and {{#ci|text nodes}} are hard to express naturally in HTML or XML. ---- {{#x|☺}} We've already made both Mozilla {{#x|Gecko}} and Apple {{#x|WebKit}} {{#i|emit}} VDOMs ---- {{img src="images/gen-vdom.png" width="544" height="563"}} ---- {{#cm|# Generate VDOM from the command line:}} {{#v|$}} {{#ci|vdomkit}} --enable-js --proxy=proxy.cn:1080 \\ http://www.sina.com.cn > sina.vdom {{#cm|# Or access our vdomkit FastCGI server directly by HTTP:}} {{#v|$}} curl 'http://vdom.cn.yahoo.com/vdom?{{#x|url=http%3A%2F%2Fwww.sina.com.cn}}' \\ > sina.vdom ---- {{#cm|# The VDOM dump is much smaller than the original HTML:}} {{#v|$}} ls -lh {{#ci|sina.vdom}} -rw------- 1 agentz agentz {{#x|278K}} 2009-04-10 10:30 sina.vdom {{#v|$}} ls -lh {{#ci|sina.html}} -rw-r--r-- 1 agentz agentz {{#x|400K}} 2009-04-10 10:34 sina.html ---- {{#cm|✓}} Now {{#ci|Perl}} enjoys {{#x|very powerful DOMs}} as good as those in JavaScript. ---- {{#kw|use}} VDOM; {{#kw|open my}} {{#v|$in}}, {{#x|"sina.vdom"}} or die {{#v|$!}}; {{#kw|my}} {{#v|$win}} = VDOM::Window->new->parse_file({{#v|$in}}); {{#kw|my}} {{#v|$body}} = {{#v|$win}}->document->body; {{#kw|for my}} {{#v|$child}} ({{#v|$body}}->childNodes) { print {{#v|$child}}->tagName; print {{#v|$child}}->x; print {{#v|$child}}->h; print {{#v|$child}}->color; print {{#v|$child}}->fontFamily; {{#v|...}} } ---- print {{#v|$child}}->nextSibling; {{#v|$win}}->document->getElementById({{#x|"foo"}}); {{#cm|# These are Firefox 3.1 DOM methods, we have too ;)}} print {{#v|$child}}->previousElementSibling; print {{#v|$child}}->firstElementChild; print $child->parentNode; print {{#kw|join}} {{#x|' '}}, {{#kw|map}} { {{#v|$$_}}->href . {{#x|': '}} . {{#v|$$_}}->textContent } {{#v|$child}}->getElmenetsByTagName({{#x|"A"}}); ---- {{img src="images/vdom-pm.png" width="449" height="1176"}} ---- {{img src="images/vdom-pm2.png" width="373" height="285"}} ---- {{#cm|☺}} {{#i|Debug}} our Perl code from within {{#ci|Firefox}} via our {{#x|Visual DOM}} extension ---- {{img src="images/visualdom-ch.png" width="863" height="636"}} ---- {{img src="images/visualdom-ch-cfg.png" width="1016" height="762"}} ---- {{img src="images/between-ff-perl.png" width="726" height="648"}} ---- {{img src="images/visualdom-lh.png" width="1016" height="762"}} ---- {{img src="images/visualdom-lh-cfg.png" width="863" height="636"}} ---- {{#cm|☺}} The {{#ci|qt-webkit port}} of our {{#x|Visual DOM}} extension: {{#i|VDOM Browser}} ---- {{img src="images/vdom-browser-config.png" width="1280" height="1024"}} ---- {{img src="images/ch-eeee.png" width="1280" height="968"}} ---- {{img src="images/ch-bbs-big.png" width="1280" height="968"}} ---- {{img src="images/ch-bbs-guided.png" width="1034" height="658"}} ---- {{img src="images/between-vdombrowser-perl.png" width="611" height="648"}} ---- {{#cm|☺}} We can get geometry information of every {{#ci|text nodes}} in the DOM! ---- {{img src="images/vb-text-nodes.png" width="1047" height="797"}} ---- ...or even as small as {{#ci|text runs}}! (text run is the undividable component of a text node which has no line breaks in it) ---- {{img src="images/vb-text-runs.png" width="1047" height="797"}} ---- {{#cm|☺}} Put everything into a {{#ci|cluster}}. ---- {{img src="images/cluster-arch.png" width="938" height="501"}} ---- {{img src="images/vdomwebkit-farm.png" width="381" height="475"}} ---- {{img src="images/proxy-guts2.png" width="379" height="355"}} ---- {{img src="images/prefetcher-guts.png" width="700" height="600"}} ---- {{img src="images/memcacheq-guts.png" width="253" height="151"}} ---- {{#cm|☺}} Most of the components have been {{#ci|opensourced}} ---- QtWebKit with {{#ci|VDOM support}} ➥ {{http://github.com/agentzh/vdomwebkit/}} ---- vdomkit ({{#ci|command-line}} utility and {{#ci|web}} interface) ➥ {{http://github.com/agentzh/vdomkit/}} ---- VDOM Browser ➥ {{http://github.com/agentzh/vdombrowser/}} ---- VDOM.pm ➥ {{http://github.com/agentzh/vdompm/}} ---- queue-size-aware version of {{#ci|memcacheq}} ➥ {{http://github.com/agentzh/memcacheq/}} ---- Queue::Memcached::Buffered (a {{#ci|Perl client}} for memcacheq) ➥ {{http://github.com/agentzh/queue-memcached-buffered/}} ---- {{#x|Acknowledgements}} {{#x|☺}} haibo++ persuaded me to believe that the {{#ci|separation}} of browser rendering engines and our hunter extractors via VDOM dumping could give rise to {{#ci|lots}} of benefits. {{#x|☺}} jianingy++ effectively {{#ci|fired}} the great WebKit craze in our team. {{#x|☺}} xunxin++ {{#ci|ported}} Visual DOM extension's JavaScript VDOM dumper to qt-webkit C++ and did most of the hard work in {{#ci|vdom-webkit}}. {{#x|☺}} xunxin++ {{#ci|ported}} patched sina's memcacheq to make it aware of queue sizes. {{#x|☺}} mingyou++ shared a great deal of his {{#ci|knowledge}} of the WebKit internals with us and also gave very good suggestions for the slides you're browsing. ---- ☺ {{#ci|Any questions}}? ☺ ----