De vita sua: Internet

Showing posts with label Internet. Show all posts

14 August 2015

Fight link rot!

Argh, it just happened to me again. I clicked on a link on a webpage only to find that the page on the other end of the link was gone. 404. A victim of link rot.

This kind of thing is more than a hassle. It's a threat to the way we work as scholars, where citing one's sources and evidence is at the heart of what we do. Ideally links would never go away, but the reality is that they do. How often? A study cited in a 2013 NYTimes article found that 49% of links in Supreme Court decisions were gone. The problem has gotten big enough that The New Yorker had a piece on it back in January.

What I wanted to do here is point out that there are ways for us scholars to fight link rot, mostly thanks to the good work of others (and isn't that the whole point of the Internet?). Back in that ideal world, publishers would take care that their links never died, even if they went out of business, but we users can help them out by making sure that their work gets archived when we use it. Instead of simply linking to a page, either link to an archived version of it or archive it when you cite it, so that others can go find it later.

I've used two archiving services, Archive.org and Webcite. Both services respect the policies of sites with regard to saving copies (i.e., via their robots.txt files), but Archive.org will actually keep checking policies, so it's possible that a page you archived will later disappear. That won't happen on WebCite. WebCite will also archive down a few links deep on the page you ask it to archive, while Archive.org just does that one page.

WebCite is certainly more targeted to the scholarly community, and their links are designed to be used in place of the originals in your work. But both of them are way better than nothing, and you'll find lots of sites using them. For convenience there are bookmarklets for each that you can put in your browser bar for quick archiving (WebCite, Archive.org).

So next time you cite a page, make sure you archive it. Maybe even use WebCite links in your stuff (like I did in this post on the non-Wikipedia links).

(FYI, another service is provided by Perma.cc, which is designed for the law, covered in this NPR story.)

Added 18 August 2015: Tom Elliott (@paregorios) notes this article on using WebCite in the field of public health (which I link to via its DOI).

13 April 2011

Paywall? What paywall?

Finally hit the NYTimes 20-article limit this evening. In place of the URL of the page I was visiting, up came another URL, suspiciously like the first, but with "?gwh=" and a string of 32 (who'd have guessed?) alphanumeric characters. Then the very pretty alert box comes up, blocking the page, and telling me that I've hit the magic number.
"Gee," I wonder, "what would happen if I just deleted that whole last bit from the question mark on?"
What happens is that the page I wanted just comes back, without the warning.
1. Why bother? Are people that bad at using their browsers that this would be effective?
2. Why pay $40M for this?

Oh, and if you're quick, you can hit command-. to cancel the page load before it hits the warning, but after the content you want to read as loaded.

20 March 2011

The NYTimes up-coming paywall

I don't quite get their plan on this (which makes me think it's about as well thought out as their other ventures and attitudes on the intertubes).

First, at no cost everybody gets to read 20 articles a month, and also to browse the "home page, section fronts, blog fronts and classifieds," and read the Top News section on the Apps. I'm going to guess that this is going to be enough for a lot of people: "What are the headlines today? Ooh, let me read about this breaking story. OK, I'm done now." BTW, since section fronts sometimes include Top News stories, I assume links to these will be free wherever you click on them. For example, right now the top three stories on the World section front are the top three stories on the iPad Top News section. (Truth be told, I assume they'll goof this up and the links won't always be free if you don't click on them in the right place.)

Second they're charging extra to have access for both an iPhone and an iPad app. Huh?

Third, if you come to one of their articles via a search engine, blog or other social media link, that article will count against your 20 if you haven't hit it yet, but you will also still be able to read it if you have already hit 20. Search hits will have a 5-a-day limit. How is this not a giant hole in their plans? The truly devious could simply enter the title they want to read into their browser's search bar, and then click on the subsequent link in the search engine. A little more work, but not much. Or they could put the link into their own tweets or FaceBook or blog.

And I assume they're keeping track of this via cookies, which is another hole: a decent cookie manager can let a user easily switch identities and circumvent the monthly limit with ease.

The NYTimes should know better than anyone how its users reach their articles, so maybe they've got this sussed out. I just can't help thinking that they don't.

On a side note, I also don't get why their bloggers keep their blogs at NYTimes.com. Krugman and Silver and the others could likely do what Frank Rich is doing and not lose much readership. Since they don't seem to get paid anyway, what does it matter to them? If their readership does drop after the paywall goes up, will they stay?

(Gruber's had a few posts on this. Here's one.)

Edited to add (21 March): That didn't take long: Twitter feed of all NYT articles is now up and running.