Now collecting new posts on new forums

As I type this it is 2019-04-08 and the Frontier forums have been up again on the XenForo v2 version for 11 days. Sadly it turned out that the 'Dev Feed' has two problems:

  1. Whilst the "Staff / Developer Posts" is listing both threads that 'developers's started and any others they posted in, they're listed in order of a latest reply by anyone. Also the 'Dev Post' button goes to the latest 'developer' post in the given thread, leaving you no easy way to find prior ones.
  2. There is RSS output available for that URL (append '?rss=1'), but the resulting feed only includes the [i]first[/i] 'developer' post per thread.

So I felt obligated to update my scraper to work against the new forums. In some ways this proved to be easier than for the old site (e.g. consistent Content-Type of utf8, instead of randomly changing, so no hacks required to deal with that). I have also been given a key for the XF2 API which makes retrieving some of the data cleaner, with less load on the forums servers.

For historical purposes I'll keepo the three archives of the pre-XF2 posts available:

  1. Postgresql 'pg_dump --format=plain': athan-ed-dev-rss-archive-20190325.sql.gz
  2. RSS, precis text only: athan-ed-dev-rss-archive-20190325.rss.gz
  3. RSS, fulltext: athan-ed-dev-rss-archive-20190325.rss.gz

What is this?

If you're a fan of Elite: Dangerous then you may also follow the Frontier Developments forums. If you do that then you may be frustrated at the lack of an easy way to see just the developer posts.

You could learn who they are and check around their profile pages manually now and then.

You could write a simple scraper that aggregates all the known developer posts onto one page.

Or you could decide that you much prefer an RSS feed, so write a scraper and RSS output for it all. That's what I did. I even ended up making an additional feed with the full "clicked-through" text of each post.

How do I get it?

If your web browser is set up with an RSS feed extension then you should see the RSS icon up in your URL bar. Simply click on that and get on with adding one of the feeds to your reader.

NB: The feeds are updated at most every 5 minutes, more slowly if the forums are especially busy. Thus there's no point polling these feeds more often than every 5 minutes. Also the output contains the last 28 days worth of posts, so if you check the feed any less frequently than that you will miss posts.

If for some reason you need to enter a feed manually, then these are the URLs:

  1. The legacy 'precis text only' feed - https://ed.miggy.org/devtracker/ed-dev-posts.rss
  2. The newer 'full text' feed - https://ed.miggy.org/devtracker/ed-dev-posts-fulltext.rss

Caveat 1 - The feeds will only contain the fully public posts from the forums, i.e. nothing from the Private Backers Forum or the Alpha Backers Forum. Also you'll see all Frontier Developments forum posts, not just those relating to Elite: Dangerous, unless I got around to blacklisting a particular non-ED forum.

Caveat 2 - Each post in the 'precis text only' feed will be as it is seen on the appropriate member profile page's "Activity List". That means you'll only ever see the new text (only a portion of it for longer posts), and never any quoted text.

The 'full text' feed will usually contain full text, including any quotes, from the relevant post. Sometimes it may have fallen back to just the precis text. This would happen if the 'click-through' to retrieve the full post text failed. Instead of failing the whole post, and possibly missing one entirely, I decided to store, and use, at least what I could.

Caveat 3 - I have specifically chosen to not include the QA-* or Support-* accounts in my scraping, as they seldom post anything of much value (to anyone but who they're replying to).

Caveat 4 - Each post will be as it appeared on the forums at the time the scraper first saw it. Posts in the RSS feed are never updated for later edits.

Searching

There's a search available on the main feed page. Or maybe you're curious about the latest seen and scraped post for each tracked developer. This will update after each run of the collector script (so currently every 5 minutes)

Who are you?

I go by the name 'Athanasius' online, and indeed I'm 'Cmdr Athanasius' in Elite: Dangerous. You'll also find me as 'Athan' on QuakeNet's #elite-dangerous and on the Frontier Forums.

But what if you stop supporting this and it breaks?

Well, I do have a history of stopping playing Elite Dangerous, yet I still keep this project going for my own use. That said, if I do completely lose interest this will keep running unless the server goes away (it's paid for and used by other people). If it does disappear then hopefully someone has bookmarked the public github repository and knows enough perl and postgresql to get it up and running elsewhere.

As of 2019-08-28 I am pondering a re-write in Python (3.x, not 2.7.x when that's EOL 1st Jan 2020!) so as to increase the chances of someone else wanting to pick it up.


News

2019-11-05 09:58 UTC
New "Planet Zoo" forums added to the ignore list.

2019-10-17 16:40 UTC
I've just added support for BBCode [s]...[/s] to come out properly as strikethrough.

2019-05-22 09:47 UTC
I've decided to permanently keep the RSS feeds at "last 28 days" of posts, rather than the old-forums "7 days".

2019-04-08 10:56 UTC
Now scraping against the XF2 forums for the live feed. There's still a little fix up that needs doing on some historical posts (where they started a thread, rather than being a reply) to have the URLs point to the actual thread on the new forums, rather than "Not Found".

Other than that everything should be working again, including the search interface (although with a double '/' in some URLs for now, which is harmless).

2019-03-25 11:00 UTC
The old forums are now down for the move to the new forums. I've stopped my scraper from running and will shortly make archives of the posts so far available.

2018-07-13 15:30 UTC
I've added "Off-Topic Discussion" to the list of ignored forums, for what should be obvious reasons.

2018-06-12 16:50 UTC
All of the new JWE related forums are now on the ignore list. A bunch got added for the game launch, so a few posts leaked into this RSS feed.

2018-03-28 15:45 UTC
I've just implemented ignoring of selected forums. This is initially the current four "Jurassic World: Evolution" forums. Once I'm confident this is working without causing issues I may expand that list so that any forum not specific to ED is ignored.

2018-02-03 22:53 UTC
I've removed 'Dale Emasiri' from the scraping as he's no longer working at Frontier.

2017-11-06 13:50 UTC
I've just removed Drew Wagar from the monitored forum accounts. He's no longer actively writing for ED and his posts aren't really relevant to the game any more. Someone let me know if this changes and I'll consider adding him back.

2017-10-10 16:53 UTC
Updated all the URLS to utilise https://ed.miggy.org/ instead of https://miggy.org/games/elite-dangerous/

2017-09-21 09:15 UTC
The new 'full text' feed hasn't shown any issues in over a week of testing, so it's now described and linked to on this page.

2017-06-05 20:05 UTC
Added Lloyd Morgan-Moore (157490) to the scraped forum accounts.

2017-03-10 17:20 UTC
I've just finished work to use what I hope are truly unique and static GUIDs for each post. Before this change if someone on the Frontier Forums edited the title of a topic then the scraper would use the changed URL as the permaLink and thus put it out as an additional new post. Where several posts had already been made within the topic this would cause spam after the edit.

I now strip out the embedded topic title from the URL, using just those parts that are necessary to reach it on the forums. A side-effect of setting this up is that the feed just got spammed with a huge amount of not-actually-new posts due to changing over to using these better GUIDs in the permaLink property of each post. This will be a one-off thing.

2016-12-01 02:49 UTC
Changed the behaviour of the scraper so it no longer takes the first known post it sees as evidence that it is up to date for that forum member's activity. It will still mean skipping that individual post, but it will continue to check all other posts visible on the activity page. NB: This assumes there'll be no more than 50 posts on an Activity List at any one time, else it will think additional ones are new.

2016-12-01 02:49 UTC
Removed Greg Ryder from the list of monitored accounts, as apparently he's been gone from Frontier for over a year now.

2016-11-15 13:28 UTC
I've just added Drew Wagar to the monitored forum posters. He's the author of one of the official books and driver of the 'Formidine Rift' mystery. He doesn't seem to post too much, but when he does it's likely about said mystery.

2016-08-26 15:12 UTC
I've just changed the code to strip the topic titles out of posts URLs. If they change, and they can with a moderator edit, the code would see all the posts in that thread as new again. This is what happened over the past day or so with the "The Galaxy - Is its size now considered to be a barrier to gameplay by the Developers?" thread in "Dangerous Discussion".

2016-08-13 09:41 UTC
I've now removed Ben Parry from the collector. He only ever posts in there Star Citizen thread now.

2016-06-12 18:50 UTC
The issue with the post URLs seems to have been fixed.

Do note that because the URLs now include the thread title in text form it means my scraper has seen a lot of older posts as now-new, as it uses exactly that URL to know if it's seen a post before. So there's been a splurge of ~50 posts repeated in the feed.

2016-06-12
There have been some forum changes that are breaking my scraper. I've got code to work around one (putting text like " 4 replies and 242 views" inside the element of class=date). But the links in the Activity Lists are now also broken, not properly containing the anchor to the specific post within the thread.

I've PM'd BrettC about both and hopefully he'll fix things.

2016-05-05 00:41 UTC
All URLs have been changed to use https://miggy.org/... rather than http://www.miggy.org/. You may want to update your RSS readers. The certificate for HTTPS/SSL is from LetsEncrypt and includes a full enough chain of Certificate Authorities that it should validate in any modern, up to date, web browser.

2016-05-05 00:17 UTC
Added " (<Forum Name>)" text to the end of item titles. This will only be the sub-forum name, but in most cases should help with things like knowing if a post is in an Xbox forum, or a more general one.

2015-07-02 19:00 BST
As the QA-* accounts mostly seem to post things like "thanks for that report", and otherwise things that aren't all that relevant I've decided to no longer scrape them for this feed. Note that FD do now seem to be pushing patch notes/changlogs to the Community Site, and that page has its own RSS feed.

Any posts I'd already scraped for QA-* accounts will stay in the database but no new ones will appear. Also, lack of probably usefulness is why I've not added the Support-* accounts either.

2015-06-16 00:00 BST
I've just finished putting together a very basic Search Interface for the 'precis' text of all the posts in the database that drives this RSS feed. Any feedback appreciated through the usual channels.

2015-06-12 16:04 BST
A small tweak to the scraping code means white space in posts is no longer collapsed, this will preserve the formatting where a post contains multiple paragraphs, bullet points etc.
Past posts will remain 'broken' in this respect.

2015-06-12 15:45 BST
Tweaked the output code to use the https:// URI scheme instead of http://, given the FD forums are enforcing the former now anyway. NB: this change will have caused GUIDs to change and thus the last 7 days worth of posts to re-appear in your feed.

2015-06-05 20:00 BST
I've just changed the RSS generation to output the last 7 days worth of posts, rather than the last 100 posts as it was before. This should hopefully avoid any issue in the future with low-frequency pollers missing posts due to high dev post rate.

2015-05-25 21:15 BST
Further to the last news post there's also an issue with superscripts in posts. I've not yet time to look into this, but it will do things like cause '1011' to be collapsed to '1011'.

Also, there's a becoming-a-FAQ; "Why don't you show the full post, including quotes where present, on each post in the feed?". The answer being:

Currently the scraping is simple, look at (only the first page/initial load of) each users' latest posts on their activity list. That's what only contains what I scrape. To get the full posts and quotes I'd have to follow each post link, incurring an extra page load. As I've already been co-ordinating with the head moderator BrettC on ensuring my scraper bot doesn't cause problems with hammering the forums, particularly when during those times the forums are already under high load, I can't afford to cause that extra load.

2015-01-27 20:05 GMT
It's come to my attention that despite claiming utf-8 output the feed sometimes attempts to include characters from other encodings. Unfortunately the source of the problem is that the FD forums are telling my scraper script that they're in iso-8859-1 format but allowing characters from windows-1252 encoding through as well.

Whilst I could try to hack around this with changing the input encoding or doing some search/replace that feels like it will break on something else in the future, so for now you'll have to put up with the weirdness and blame FD's forum setup for allowing the 'bad' characters'.

2014-10-25 00:47 BST
I've now updated my scripts for the new forums. All looks to be OK, although you might have refreshed the feed just as I had a date error in it (the cause of which has been found and rectified).

If you did then you might fail to pick up new items until one appears after Sat, 25 Oct 2014 16:24:00 +0000, depending on how your RSS reader acts.


[Valid RSS]