A lawsuit between Gatehouse Media and the New York Times Co., owner of the Boston Globe has recently been resolved.

According to numerous news sources, Boston.com and it’s new “Your Town” websites were accused of scraping headlines, ledes and links from Gatehouse’s “Wicked Local” websites.

Web scraping or harvesting is described as a method to extract content from a website. – From www.kneoteric.com/knowledge-base/glossary/glossary.html

While scraping is not inherently evil, it can present issues of plagiarism and other editorial and business issues. There are a lot of websites that compile or aggregate certain headlines and content with much success and it’s perfectly OK. However, as Boston.com and Gatehouse’s web sites were in direct competition in the same regional market, the federal court in Boston decided that in this case, scraping was not entirely on the up-and-up.

The case may also set a fascinating precedent that has a far-reaching affect on editorial websites and linking behavior in general. More importantly, it is a case that should be considered extremely important for Helium writers who use links in their articles to great (and not so great) effect.

As Helium does not allow plagiarism at any level, scraping of content word-for-word is strictly prohibited. However, members are allowed to link to related sites and articles that fit into the context of what they are writing about. It is also OK to summarize or paraphrase certain passages on other sites that you may link to. In either case, this will ensure that you are not copying editorial property of another company or person.

Referring to headlines in your article is a different case. If you have an article about, let’s say, the reproductive habits of the Duck-billed Platypus and to support a new finding on a recent scientific article in somewhere, It is OK to write something like:

New findings have been recently made regarding the season that the Duck-billed Platypus prefers to mate. In the article called “Name of Article” [with link inserted] in the “name of publication” so and so states that…

Titles on Helium should also be looked at differently. Many readers and writers may find that a title on Helium may also exist on Associated Content, About.com and some random blog. Many titles such as “How to do this,” “A guide to that,” or “Tips for doing such and such” are considered public domain types of titles. They are commonly used by everyone and are recognized as such.

However, if a member decides to submit a title such as “The classification of the genome for aggressiveness found in the saliva of the leaf-cutter ants of Belize” is probably a no-go because you are copying a distinct and unique title that was specifically written about a new concept that is not considered public domain–unless it’s your original article. However, if you added “Analyzing Joe Schmoe’s article called …” at the beginning, you recognize the original source and that, as they say, is kosher.

The case between Gatehouse Media and the New York Times Co. is not the first time that an online editorial entity had an issue with another for links and scraping, and it will likely not be the last. But as a Helium member, do the safe and ethical thing by providing context, credit  and links any chance you get.

While you may not get immediate results, you will improve your reputation as an online writer of contextual substance with sound ethics.

(Disclosure: Helium.com has a business relationship with Gatehouse Media and maintains a strictly neutral voice in this case. Helium.com, it’s staff and it’s members were not involved in this case and did not participate in creation of any content involved in this case.)

A common SEO (Search Engine Optimization) term that is used nowadays is “link juice.” So what is link juice?

Link juice basically refers the passing of authority by other sites to the destination site.


Let’s back up for a second. When we talk SEO most people really are talking about ranking in Google. Google has a patented method for its ranking engine called PageRank. Google has stated it uses that plus various other methods and offsets to rank its index but PageRank is the most well known.

So how does PageRank basically work? Basically, each site starts out with a PageRank of 1.0. Google then takes every site it links to and divides it’s PageRank up and passes that on. So, if that site with PageRank 1.0 (we’ll refer to it as site A) linked to 4 sites (B-E) it would pass on 0.25 of PageRank to each.

Now each of those sites has a PageRank of 1.25 (1.0 original + 0.25 from A). They then link out to a certain amount of sites and Google divides their PageRank of 1.25 by the number of outbound links they have and passes that on. And so on and so forth through many iterations.

So what do you end up with? Some sites end up with many, many inbound links that add to their PageRank. These sites are the ones you typically find at the top of search results for almost any search terms.

So, again, what is link juice? Link juice is the passing of PageRank. So when a site links to another, it is said to be passing on it’s link juice to it – basically, it’s passing on a portion of it’s PageRank. That’s it.

Now, this is a very basic and general example – keywords, niche and many other factors can come into play. But that’s a separate post.