Connecting PCGamingWiki and Wikidata
Adding PCGamingWiki identifiers to more than 3000 video games on Wikidata
The founder of PCGamingWiki, Andytizer, asked me to do a writeup on my work importing PCGamingWiki’s dataset into Wikidata. The goal of this blog post is to go through the process of creating the PCGamingWiki ID Wikidata Property, the process of creating the importer, and then to explain the purpose and impact of this project.
I’m writing this while the script actually runs, which is taking quite a while due to the various checks and safeguards in place, plus the script needing a restart every few minutes due to an error I haven’t figured out (more on that later). This will be a pretty simple post, my goal is just to provide an example of a data import for others to hopefully reference in the future, but it won’t be particularly in-depth, nor will you be able to go and import your own dataset with the information provided in this post alone.
Explanation
Before I get into it, I want to go over some of the terms I’ll be using here:
- Ruby: programming language used primarily for scripting and web development, I wrote the scripts for this project in Ruby.
- Wikidata: A project by The Wikimedia Foundation (you know them for Wikipedia, probably) aimed at cataloguing data on everything. Films, television shows, books, games, people, places, scholarly articles, animals, stars, and more.
- “Wikidata item”: A Wikidata item is any item in the Wikidata dataset (usually called something like
Q2074746
). - PCGamingWiki: A wiki that attempts to collect fixes and information on every PC video game ever released, whether that be DOS, macOS, Linux, Windows, or any of the other “personal computer” platforms.
Creating the PCGamingWiki ID property
On Wikidata, Properties are attributes that can be applied to Wikidata items, for example a video game may have a developer, publisher, genre, publication date, etc.
Anyone can create a property proposal on Wikidata, and then other members of the community can give feedback or support/oppose the property’s creation. If “consensus” is reached, the property can be created a week after the proposal’s creation.
You can see my proposal for the PCGamingWiki ID property here. The proposal went through pretty smoothly, with only one person opposed.
In the end, my proposal resulted in the creation of the 6337th Wikidata property: PCGamingWiki ID (P6337).
Creating the Import Script
Starting with mix’n’match
I started the import with mix’n’match, which is a tool hosted by the Wikimedia Foundation that can be used to manually associate items in another database with items in Wikidata. It’s a pretty neat tool, and was relatively easy to get set up with.
I created the PCGamingWiki catalog for mix’n’match using the following two scripts.
generate_pcgw_list.rb
, which uses the PCGamingWiki API to generate a JSON file with information on every PCGamingWiki article in the Games
category.
require 'open-uri'
require 'json'
gcmcontinue_value = ""
games_list = []
index = 0
loop do
if index == 0
api_url = 'https://pcgamingwiki.com/w/api.php?action=query&generator=categorymembers&gcmtitle=Category:Games&prop=info&inprop=url&gcmlimit=500&format=json'
end
if gcmcontinue_value != ""
api_url = "https://pcgamingwiki.com/w/api.php?action=query&generator=categorymembers&gcmtitle=Category:Games&prop=info&inprop=url&gcmlimit=500&format=json&gcmcontinue=#{gcmcontinue_value}"
end
games_json = JSON.load(open(api_url))
unless games_json["continue"].nil?
gcmcontinue_value = games_json["continue"]["gcmcontinue"]
end
games_json["query"]["pages"].each do |page|
page = page[1]
game = {}
game["pageid"] = page["pageid"]
game["title"] = page["title"]
game["fullurl"] = page["fullurl"]
game["pcgw_id"] = game["fullurl"].sub('https://pcgamingwiki.com/wiki/', '')
games_list << game
end
break if games_json["continue"].nil?
index += 1
end
File.write('pcgw_games_list.json', games_list.to_json)
pcgw_games_to_mixnmatch.rb
, which takes the JSON file generated by the previous script and turns it into a tab-delimited file that mix’n’match will accept.
require 'open-uri'
require 'json'
games_list = JSON.parse(File.read('pcgw_games_list.json'))
games_list.uniq! { |game| game['pcgw_id'].downcase }
game_array = []
games_list.each do |game|
game_array << "#{game['pcgw_id']}\t#{game['title']}\tn/a"
end
File.open("pcgw_catalog.txt", "w+") do |f|
game_array.each { |element| f.puts(element) }
end
mix’n’match has a specific tab-delimited import format it uses, which requires a name, catalog ID (in this case the PCGamingWiki ID), and a description. PCGamingWiki doesn’t have descriptions as it’s not a traditional wiki like Wikipedia, so I left those as n/a
. I spent a bit trying to generate the descriptions (e.g. automatically generate a description, like video game from 2004 by Valve Corporation
) by getting the release date and developer for each game as well, but due to limitations with the PCGamingWiki API I wasn’t able to successfully get that working.
The lack of descriptions in the dataset unfortunately makes using mix’n’match less effective, and I’d like to revisit it in the future to make it easier to manually match PCGamingWiki articles with Wikidata items, which will likely be the way the rest of the dataset will have to be imported.
You can see the catalog of PCGamingWiki IDs in mix’n’match here: PCGamingWiki IDs catalog
While I was able to get just under 150 Wikidata items set up with PCGamingWiki IDs via mix’n’match, the main purpose of associating the data manually in the first place was to verify the quality of the data.
Once that was done, I could get started on the actual Ruby script.
Writing the actual script
This code is:
- Bad
- Somewhat commented
- Has dependencies that you need to install (the
sparql
andmediawiki_api-wikidata
gems specifically) - Requires that you have a CSV file of all the PCGamingWiki (Vetle kindly provided two PHP scripts for generating this CSV:
pcgw_export.php
andcreate_pcgw_steamid_csv.php
) - Will eventually fail with some sort of DNS error I can’t be bothered to figure out.
The code is also functional, and works well enough for my purposes. I wouldn’t recommend copying it wholesale because, as I said, it’s bad. It could probably be refactored quite a bit to be easier to follow and to output progress information in a better way, or to catch errors more cleanly, but this is what I ended up with, and it works well for my purposes, so I didn’t want to go through the process making it readable or easily reusable.
You can run it with WIKIDATA_USERNAME=username WIKIDATA_PASSWORD=password ruby pcgw-to-wikidata.rb
(be sure to replace username
and password
with your actual username and password). I wouldn’t recommend running it without understanding the consequences, however. It will modify data on the actual Wikidata website. Also, you should use a bot to run this script.
# Dependencies:
# gem install sparql
# gem install mediawiki_api-wikidata
require 'sparql/client'
require 'json'
require 'csv'
require 'open-uri'
require 'mediawiki_api'
require "mediawiki_api/wikidata/wikidata_client"
require "net/http"
# SPARQL Query to find the, pass the Steam App ID and it'll return a query
# that finds any Wikidata items with that App ID.
def query(steam_app_id)
sparql = <<-SPARQL
SELECT ?item ?itemLabel WHERE {
?item wdt:P1733 "#{steam_app_id}".
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 10
SPARQL
return sparql
end
# Finds Wikidata items based on the Steam App ID it's passed.
# This uses the SPARQL query defined in query().
def find_wikidata_item_by_steam_app_id(app_id)
endpoint = "https://query.wikidata.org/sparql"
client = SPARQL::Client.new(endpoint, :method => :get)
sparql = query(app_id)
begin
rows = client.query(sparql)
# Catch the DNS error that occurs somewhat randomly
# The script doesn't seem to be capable of recovering without a restart, so
# this doesn't actually do much.
rescue SocketError => e
puts e
sleep 5
return nil
end
# If there are 0 rows (no data returned) or more than one row, just skip it.
# Steam AppIDs should be unique (in theory).
return nil if rows.size != 1
return_row = {}
rows.each do |row|
return_row = { url: row.to_h[:item].to_s, title: row.to_h[:itemLabel].to_s }
end
return return_row
end
# Verify that the PCGW ID is valid by sending a request to the article page.
def verify_pcgw_url(pcgw_id)
url = URI.parse("https://pcgamingwiki.com/wiki/#{pcgw_id}")
req = Net::HTTP.new(url.host, url.port)
req.use_ssl = true
res = req.request_head(url.path)
if res.code == "200"
return true
elsif res.code == "404"
return false
else
return false
end
end
pcgw_steam_ids = []
# Go through the CSV and create a hash for each PCGW item and its Steam App ID
# The CSV is in a format like this:
# "Half-Life",70
# "Half-Life 2",220
# "Half-Life 2: Deathmatch",320
# "Half-Life 2: Episode One",380
# "Half-Life 2: Episode Two",420
# "Half-Life 2: Lost Coast",340
# "Half-Life Deathmatch: Source",360
# "Half-Life: Blue Shift",130
# "Half-Life: Opposing Force",50
# "Half-Life: Source",280
#
# The CSV needs to be in the same directory as this script, with
# the name 'pcgw_steam_ids.csv'.
CSV.foreach(
File.join(File.dirname(__FILE__), 'pcgw_steam_ids.csv'),
skip_blanks: true,
headers: false,
encoding: 'ISO-8859-1'
) do |row|
# Skip the row if the length is >40 characters. This is a hack to get around a
# weird issue where some game titles have really screwy encoding problems.
next if row[0].length > 40
pcgw_steam_ids << {
title: row[0],
steam_app_id: row[1]
}
end
# Authenticate with Wikidata.
wikidata_client = MediawikiApi::Wikidata::WikidataClient.new "https://www.wikidata.org/w/api.php"
wikidata_client.log_in ENV["WIKIDATA_USERNAME"], ENV["WIKIDATA_PASSWORD"]
# For every PCGW item created from the CSV, find the respective wikidata item
# and then compare the id of the PCGW item and the Wikidata item found via the
# Steam App ID.
pcgw_steam_ids.each do |game|
# Get the wikidata item for the current game's Steam App ID
wikidata_item = find_wikidata_item_by_steam_app_id(game[:steam_app_id])
# If no wikidata item is returned, skip this PCGW item.
next if wikidata_item.nil?
next if game[:title].encoding.to_s != "ISO-8859-1"
# Replace the spaces in the game title with underscores to get the PCGamingWiki ID.
game[:pcgw_id] = game[:title].gsub(/ /, '_')
begin
# Compare the game title from PCGW and the wikidata item's title.
# These are downcases so that minor differences in capitalization don't
# cause the script to skip them.
if game[:title].downcase == wikidata_item[:title].downcase
wikidata_id = wikidata_item[:url].sub('http://www.wikidata.org/entity/', '')
puts "Wikidata Item ID: #{wikidata_id}, game[:pcgw_id]: #{game[:pcgw_id]}"
# Check if the property already exists, and skip if it already does.
claims = JSON.load(open("https://www.wikidata.org/w/api.php?action=wbgetclaims&entity=#{wikidata_id}&property=P6337&format=json"))
if claims["claims"] != {}
puts "This already has a PCGW ID"
next
end
puts "This doesn't have a PCGW ID yet"
# Verify that the PCGamingWiki ID is valid and has an associated page.
# This is mostly a sanity check to make sure no incorrect values
# make it into Wikidata.
if !verify_pcgw_url(game[:pcgw_id])
puts "No PCGW page with the id #{game[:pcgw_id]}."
next
end
# Create the claim of the PCGamingWiki ID for the given Wikidata item.
wikidata_client.create_claim wikidata_id, "value", "P6337", "\"#{game[:pcgw_id]}\""
puts "Updated #{game[:title]}: #{wikidata_item[:url]}"
# Print a message about the game titles not being equivalent and continue to the next entry.
else
puts "#{game[:title]} does not equal #{wikidata_item[:title]}"
end
rescue Encoding::CompatibilityError => e
puts e
end
# Sleep for 1 second to ensure we don't get rate limited.
sleep(1)
end
Here’s an example of what the script outputs as it’s running:
Wikidata Item ID: Q279744, game[:pcgw_id]: Half-Life
This already has a PCGW ID
Wikidata Item ID: Q193581, game[:pcgw_id]: Half-Life_2
This already has a PCGW ID
Wikidata Item ID: Q1325680, game[:pcgw_id]: Half-Life_2:_Deathmatch
This doesn't have a PCGW ID yet
Updated Half-Life 2: Deathmatch: http://www.wikidata.org/entity/Q1325680
Wikidata Item ID: Q18951, game[:pcgw_id]: Half-Life_2:_Episode_One
This doesn't have a PCGW ID yet
Updated Half-Life 2: Episode One: http://www.wikidata.org/entity/Q18951
Wikidata Item ID: Q553308, game[:pcgw_id]: Half-Life_2:_Episode_Two
This doesn't have a PCGW ID yet
Updated Half-Life 2: Episode Two: http://www.wikidata.org/entity/Q553308
Wikidata Item ID: Q223629, game[:pcgw_id]: Half-Life_2:_Lost_Coast
This doesn't have a PCGW ID yet
Updated Half-Life 2: Lost Coast: http://www.wikidata.org/entity/Q223629
Wikidata Item ID: Q55354219, game[:pcgw_id]: Half-Life_Deathmatch:_Source
This doesn't have a PCGW ID yet
Updated Half-Life Deathmatch: Source: http://www.wikidata.org/entity/Q55354219
Wikidata Item ID: Q831796, game[:pcgw_id]: Half-Life:_Blue_Shift
This doesn't have a PCGW ID yet
Updated Half-Life: Blue Shift: http://www.wikidata.org/entity/Q831796
Wikidata Item ID: Q693937, game[:pcgw_id]: Half-Life:_Opposing_Force
This doesn't have a PCGW ID yet
Updated Half-Life: Opposing Force: http://www.wikidata.org/entity/Q693937
Wikidata Item ID: Q1199351, game[:pcgw_id]: Half-Life:_Source
This doesn't have a PCGW ID yet
Updated Half-Life: Source: http://www.wikidata.org/entity/Q1199351
It’s not the prettiest output, and could definitely be improved (e.g. a line break to better differentiate between separate games, SUCCESS/FAILURE messages to make it easier to skim, printing all the “Game Name is not equal Game Name 2” messages into a file for future use, etc.), but it works for my purposes here.
The script has a few safeguards/checks in place to ensure that no PCGamingWiki IDs are associated with the incorrect Wikidata item:
- Verify that the PCGamingWiki ID is valid by sending a request to the respective PCGamingWiki article and checking that it succeeds.
- Compare the PCGamingWiki article title and the Wikidata item title. This needs to be a perfect match, with the exception of case sensitivity, which is disregarded. This results in quite a few false negatives (e.g.
Nier:Automata
andNier: Automata
not matching despite obviously being the same game), but it prevents submission of a wide range of possibly incorrect data. In the future I may return to the false negatives and add their associations manually. - Must match based on Steam IDs. This means that if either the PCGamingWiki article or Wikidata item don’t have an associated Steam Application ID, they can’t be matched to one another. This also prevents association of games that aren’t available on Steam (for example, Fortnite is only available via Epic Games, or EA games only available on Origin). This isn’t ideal, but it’s the best way to accurately associate a large portion of the catalog as Steam IDs are common for Wikidata items as well as PCGamingWiki articles. It may be worth going modifying the script to also create matches based on other IDs, but as a first attempt using exclusively Steam IDs is definitely the best solution.
Due to these limitations, a few hundred correct matches are probably missed, but I’d argue that’s a better result than introducing incorrect data into Wikidata.
The DNS error
This is the Traceback I get (I named the script mentioned above csgo.rb
for reasons that are now lost to me):
Failed to open TCP connection to query.wikidata.org:443 (getaddrinfo: nodename nor servname provided, or not known)
Traceback (most recent call last):
18: from csgo.rb:102:in `<main>'
17: from csgo.rb:102:in `each'
16: from csgo.rb:121:in `block in <main>'
15: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/open-uri.rb:35:in `open'
14: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/open-uri.rb:736:in `open'
13: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/open-uri.rb:165:in `open_uri'
12: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/open-uri.rb:224:in `open_loop'
11: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/open-uri.rb:224:in `catch'
10: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/open-uri.rb:226:in `block in open_loop'
9: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/open-uri.rb:756:in `buffer_open'
8: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/open-uri.rb:337:in `open_http'
7: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/net/http.rb:919:in `start'
6: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/net/http.rb:930:in `do_start'
5: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/net/http.rb:945:in `connect'
4: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/timeout.rb:103:in `timeout'
3: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/timeout.rb:93:in `block in timeout'
2: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/net/http.rb:947:in `block in connect'
1: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/net/http.rb:947:in `open'
/Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/net/http.rb:947:in `initialize': getaddrinfo: nodename nor servname provided, or not known (SocketError)
17: from csgo.rb:102:in `<main>'
16: from csgo.rb:102:in `each'
15: from csgo.rb:121:in `block in <main>'
14: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/open-uri.rb:35:in `open'
13: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/open-uri.rb:736:in `open'
12: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/open-uri.rb:165:in `open_uri'
11: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/open-uri.rb:224:in `open_loop'
10: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/open-uri.rb:224:in `catch'
9: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/open-uri.rb:226:in `block in open_loop'
8: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/open-uri.rb:756:in `buffer_open'
7: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/open-uri.rb:337:in `open_http'
6: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/net/http.rb:919:in `start'
5: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/net/http.rb:930:in `do_start'
4: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/net/http.rb:945:in `connect'
3: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/timeout.rb:103:in `timeout'
2: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/timeout.rb:93:in `block in timeout'
1: from /Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/net/http.rb:946:in `block in connect'
/Users/connorshea/.rbenv/versions/2.6.0/lib/ruby/2.6.0/net/http.rb:949:in `rescue in block in connect': Failed to open TCP connection to www.wikidata.org:443 (getaddrinfo: nodename nor servname provided, or not known) (SocketError)
The script would just get stuck with this error repeated endlessly:
Failed to open TCP connection to query.wikidata.org:443 (getaddrinfo: nodename nor servname provided, or not known)
Failed to open TCP connection to query.wikidata.org:443 (getaddrinfo: nodename nor servname provided, or not known)
Failed to open TCP connection to query.wikidata.org:443 (getaddrinfo: nodename nor servname provided, or not known)
Failed to open TCP connection to query.wikidata.org:443 (getaddrinfo: nodename nor servname provided, or not known)
Best I can tell, this is caused by some sort of DNS issue (closest StackOverflow question I could find), but it wasn’t enough of a problem for me to actually investigate it deeply and resolve it. It could be caused by any number of things, a rate-limiter on the Wikidata side of things, a bug in Ruby (it occurs in both Ruby 2.5.3 and 2.6.0, so it’s not a Ruby 2.6-specific issue), a problem with my router or ISP, or a bug in one of the libraries I’m using.
If anyone has any suggestions on fixes for this, it’d be appreciated. As it is now, I have to babysit the script and restart it whenever this error shows up, typically after just a few minutes.
Bonus leftovers.rb
Script
After completing the automated portion of the import, I wanted to handle the “leftovers”, so I wrote another Ruby script that runs in the command line, called leftovers.rb
.
I took my terminal output from running the automated script, then got every line from it that had “does not equal” (e.g. “Neir: Automata does not equal NeiR:Automata”) and used Visual Studio Code’s multi-cursor selection abilities to make a JSON file that looked a bit like this (but with around 500 entries):
[
{
"pcgw": "King of Dragon Pass (2015)",
"wikidata": "King of Dragon Pass"
},
{
"pcgw": "King's Quest (2015)",
"wikidata": "King's Quest"
}
]
I then took that file, and used the Wikidata API plus the PCGamingWiki data I already had to get the PCGamingWiki IDs and Wikidata item values (e.g. Q123456). I also added a field called checked_for_match
. The resulting leftovers.json
looked like this:
[
{
"pcgw": "King of Dragon Pass (2015)",
"wikidata": "King of Dragon Pass",
"pcgw_id": "King_of_Dragon_Pass_(2015)",
"checked_for_match": true,
"wikidata_item": "Q6412224"
},
{
"pcgw": "King's Quest (2015)",
"wikidata": "King's Quest",
"pcgw_id": "King's_Quest_(2015)",
"checked_for_match": true,
"wikidata_item": "Q4042212"
}
]
From there I wrote the leftovers.rb
Ruby script. Essentially, it’s a command line utility that helps the user go through the leftovers.json
file and manually match the remaining “leftover” games. It checks to make sure the Wikidata item doesn’t already have a PCGW ID, then it opens the Wikidata and PCGamingWiki pages in the browser (this uses the open
macOS terminal command, I’m not sure if an equivalent exists on Windows or Linux, but I doubt this script will work on other operating systems without modification) for the user to compare, and awaits a y/n response in the terminal. It updates the checked_for_match
value of each game as the user goes through the JSON, and updates the JSON file on every iteration. This allows for the script to be restarted without the user losing progress.
This is the leftovers.rb
script:
# Handle leftovers with an interactive CLI that opens the PCGW and Wikidata
# pages in the browser for easy comparison. It uses a JSON file in the
# format of leftovers.example.json and marks entries as "checked for match"
# whenever the game has been checked. This allows you to easily stop and start
# the script at will without needing to redo any of the checks you've already
# done.
require 'json'
require 'mediawiki_api'
require 'mediawiki_api/wikidata/wikidata_client'
require 'open-uri'
# Load the leftover games.
leftovers = JSON.load(File.read('leftovers.json'))
# Authenticate with Wikidata.
wikidata_client = MediawikiApi::Wikidata::WikidataClient.new "https://www.wikidata.org/w/api.php"
wikidata_client.log_in ENV["WIKIDATA_USERNAME"], ENV["WIKIDATA_PASSWORD"]
# Go through each leftover game.
leftovers.each do |game|
# Skip if the game has already been checked.
next if game['checked_for_match']
# Skip if the Wikidata item already has a PCGamingWiki ID.
claims = JSON.load(open("https://www.wikidata.org/w/api.php?action=wbgetclaims&entity=#{game['wikidata_item']}&property=P6337&format=json"))
if claims["claims"] != {}
puts "This already has a PCGW ID"
game['checked_for_match'] = true
next
end
puts "PCGamingWiki: #{game['pcgw']}"
puts "Wikidata: #{game['wikidata']}"
# Open the PCGamingWiki article in the browser.
pcgw_url = "https://pcgamingwiki.com/wiki/#{game['pcgw_id']}"
system("open \"#{pcgw_url}\"")
# Open the Wikidata item in the browser.
wikidata_url = "https://www.wikidata.org/wiki/#{game['wikidata_item']}"
system("open '#{wikidata_url}'")
# Await a y/n response from the user.
puts "Does the PCGW item match the Wikidata item? [y/n]:"
items_match = gets.chomp
# If the user responds with a yes, update the Wikidata item.
if items_match.downcase == 'y'
puts "Updating the wikidata item."
wikidata_client.create_claim game['wikidata_item'], "value", "P6337", "\"#{game['pcgw_id']}\""
end
# Mark the game as checked_for_match.
game['checked_for_match'] = true
# Update the leftovers.json file.
File.write('leftovers.json', leftovers.to_json)
puts
end
In the terminal, the output looks something like this:
PCGamingWiki: Nights of Azure
Wikidata: Yoru no Nai Kuni
Does the PCGW item match the Wikidata item? [y/n]:
y
Updating the wikidata item.
PCGamingWiki: No Man's Sky
Wikidata: No Man’s Sky
Does the PCGW item match the Wikidata item? [y/n]:
y
Updating the wikidata item.
This already has a PCGW ID
PCGamingWiki: Oddworld: New 'n' Tasty!
Wikidata: Oddworld: Abe's Oddysee New N' Tasty!
Does the PCGW item match the Wikidata item? [y/n]:
It probably would’ve been a better idea to make this a webpage that loads the PCGamingWiki and Wikidata pages in iframes within the same window, but I already had most of the components – wikidata authentication, etc. – written in other Ruby scripts, so I went with Ruby.
I’m pretty proud of this simple little script, it’s not as elegant as it could be, but it works pretty well and helps close the gap with the remaining articles that the automated script couldn’t guarantee matches for.
Purpose and Impact
I started this project because I was hoping to find a good source of information on all kinds of video games, and had hoped Wikidata would be that source. Wikidata, while it covers a huge range of games, still doesn’t have the depth nor the breadth necessary to source a proper video game database. My hope is that by connecting Wikidata and PCGamingWiki, Wikidata will gain more good data from the well-curated data on PCGamingWiki, and that PCGamingWiki will similarly benefit from any improvements or new data Wikidata provides.
The actual impact thus far is that more than 3200 items on Wikidata now have PCGamingWiki IDs attached to them. The Wikidata community can use this information to potentially get highly accurate data on developers, publishers, release dates, and more. PCGamingWiki also benefits from this project, as it can now more easily access data about the games and provide links to other databases, including the GiantBomb Wiki, Metacritic, GameRankings, and others.
I hope this writeup is useful to others interested in improving Wikidata’s video game coverage, and I hope this project will help both Wikidata and PCGamingWiki become more complete video game databases :)
Thanks to Andytizer for creating PCGamingWiki and allowing me to do this project, snuxoll and Vetle in the PCGamingWiki Discord for their help with the PCGamingWiki API, and Jean-Frédéric on Wikidata for his help with the PCGW ID property and guidance with importing the data.
The code in this post is also available in my random-scripts repository, and is licensed under the MIT License.