Here is a diagram from seomoz about factor weight for search engine ranking. For web textual data, this may be a very good distribution. Similarly, the weight approach has been applied for research in software engineering such as recommendation system, feature location, etc. For software related data(source code or textual data), the factor should be different. The question is: What those factors should be replaced with for various software artifacts. Maybe it's a good idea to explore this pie for software engineering.
<pre><code> String foo = "bar"; </code></pre>