Wikipedia Challenges

This was sent in Issue #55 of Squirrel Notes on December 23, 2020.

This is a sobering article:

Wikipedia is in some trouble

The author explains many problems facing Wikipedia right now. Some are organizational, some are social, some are political, but many are purely technical, based on decisions that were made long ago.

The markup is old, and it’s a problem.

If wikipedia had used markdown, html, or some standardised format, any parser would flip-it into other future formats. Wikipedia’s custom language is just clearly insane, undocumented, hopeless.

Given the sheer volume of Wikipedia content, how do you fix that? Can you imagine trying to come up with a new standard and then migrating to it?

And less than half the URLs even go to articles, it turns out.

[...] of the 14m records in the wikipedia dump, only 5.5m (40%) are public-facing articles. There are 8,550,441 redirects in wikipedia. They are mostly typos, or case-changes, and are mostly created by hand, every day. And what happens to a redirect when a page gets deleted, or merged, or split...

And Wikipedia runs into basic IA problems, because category structures tend to break down when they get too deep.

Wikipedia has many-thousands of categories. They loop-around all-over the place.

Albanian language → [7-8 nested categories, then...] → Languages of Kosovo → Albanian language

Related: I read an amazing book about MediaWiki a couple years ago. It’s a remarkable anti-CMS.

This is item #272 in a sequence of 305 items.

You can use your left/right arrow keys or swipe left/right to navigate