Take the following statement.
Googlebot processes each of the pages it crawls in order to compile a massive index of all the words it sees and their location on each page. In addition, we process information included in key content tags and attributes, such as Title tags and ALT attributes. Googlebot can process many, but not all, content types. For example, we cannot process the content of most Flash files or dynamic pages.
You can find this text in the google indexing FAQ site here. What I find compelling about this statement is that of the last line..
we cannot process the content of most Flash files or dynamic pages.
This raises an interesting question with regard to the future of search engines as we enter the web 3.0 era. As HTML has slowely reached the limits of what it can offer, we are starting to see the rise of new rich/smart client style applications based on technolgies such as silverlight, Flex and Flash.
Tim Berners-Lee has predicted that web 3.0 could possibly be based on vector graphics platforms. These technologies offer rich vector graphics with runtime libraires that are delivered direct to the desktop. With these runtimes comes the use of custom binary formats that allow code to be run within the browser and peform functions that standard HTML/Ajax apps would dream of.
So with the winds of change coming....how will search engines continue to perform effective indexing of site content when its delivered in custom binary formats?
- Will we see new standards defined that will be specifically used for the publication & management of site content for indexing?
- Will search engines implement new techniques for indexing binary files?
- Will our Web 3.0 apps always require HTML equivalents for supporting search providers?
I dont know the answers to these questions yet but I am looking forward to seeing how it pans out.