Technical Difficulties

So a few days ago, I rebooted my Windows notebook because of a Windows 8 update, and it crashed like it did back in March. I became enraged, and after a few choice words I can summarize as “#$%&! Windows 8 !#@%^&$%!!” I started to recover my system.

The system would boot into a never-ending loop between recovery attempt (which failed) and crash (which prompted another recovery attempt), but nothing seemed to fix it. I eventually gave up with the usual repair options, which themselves seemed useless.

I dug out the DVD and started a clean install. Except gui-mode install wouldn’t recognize the disk as a valid target. It claimed the bios settings were wrong and that disk wasn’t marked bootable.

What?

I looked through the bios menu, fiddled with this and that, and still nothing.

Eventually I punted and pulled out a Fedora 18 x64 DVD, and installed that. Fedora happily installed onto the “non bootable disk” and off I went. Except on the reboot from live CD into the newly installed OS, I received “Operating System Not Found”. Doh!

After more fiddling around *sigh* I believe that the SSD I installed a mere 8 months or so ago, has crapped out. Or at least what passes as the boot sector of an SSD, where the bios looks for the info needed to boot the rest of the system, has failed. I can boot off DVD into the system, but booting from the SSD itself bombs out.

This investigation took up and hour or two over the past oh… 5 days or so.

I have a coworker tell me he is disappointed in SSDs, due to their failure rate. The rest of us would jokingly tell him that was in the past! Modern SSDs are much better, with huge MTBFs that can support years of writes! On the other hand, my gaming system chewed one up in 8 months. Or I was just unlucky and got a bum SSD.

In any case, while I do have my trusty Mac Mini, I would like to get a Windows system back up and running. For one thing, I’ll miss playing The Secret World. At least LoTRO and GW2 are still available to me.

The other thing I did was configure a new gaming system. Never let a crisis go to waste. 😉 In years past I would build my own PCs, usually SFF (small form factor) systems, and assemble everything after the parts arrived. It was kind of fun. However now I’m lazy and instead I went with a vendor that lists barebones to complete systems and lets you configure them within a menu for various items.

I wound up specing out:

  • core i5 4570 3.6 GHz
  • MSI B85M P33 mainboard
  • 16 GB mem
  • 1 TB SATA 3 harddrive – yeah, no SSD this time around
  • GeForce GTX 650

I’ve got an old monitor (20 inch, in another few months I might splurge and get another 24 inch like the one for my Mac Mini), mice, and keyboards so I can reuse those and save a little. The system comes in at just over $1K. I think it compares well against the Ars Technica budget gaming box from April 2013, however it is difficult to directly compare due to the number of differences.

This should be a nice bump up – the previous system was a 2nd gen core i7 (2.6 GHz), 6 GB mem, and a GeForce 285M. It served me well but it is time to replace it.

Windows 8 Update

It was not a good past few days for my Windows 8 notebook, because the recent patch Tuesday really broke something. After trying to fix things Tuesday evening, I gave up and reinstalled (used the “Remove everything” option new in Windows 8) and as of late Thursday, got things back to working conditions.

My computer crashed during the shutdown after patching, so it then booted into recovery mode, which didn’t do anything to stop a crash on the next reboot. And as a side effect, my networking stack was also nonfunctional.

The Windows 8 crash screen is friendlier looking – it displays a smiley with a nice quote along the lines “We apologize, there was a terrible error and it all blew up.” No more bugcheck a.k.a. blue screen of death, which was not useful to most people, unless you happen to have a kernel debugger attached.

Good thing I also had my Mac Mini so I wasn’t completely dead in the water! I could still play LoTRO and EVE, and Google search for information, while my Windows notebook was useless.

Anyway, I happen to be set up for some debugging my home notebook (a habit from my day job) so looked at the minidump file and noted the problem was indeed during the shutdown code path. Ironically, a “graceful shutdown” call was in the stack trace. Maybe not-so-graceful after all.

EXCEPTION_RECORD: fffff8800e1048e8 — (.exr 0xfffff8800e1048e8)
ExceptionAddress: fffff800386f30da (nt!memcpy+0x000000000000021a)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 0000000000000000
Parameter[1]: ffffffffffffffff
Attempt to read from address ffffffffffffffff

READ_ADDRESS: GetPointerFromAddress: unable to read from fffff800389d3168
GetUlongFromAddress: unable to read from fffff800389d31f8
ffffffffffffffff

FOLLOWUP_IP:
nt!AhcStreamWrite+6a
fffff800`38b6d6ea 498b0e mov rcx,qword ptr [r14]

5: kd> k
Child-SP RetAddr Call Site
fffff880`0e104b28 fffff800`38b6d6ea nt!memcpy+0x21a
fffff880`0e104b30 fffff800`38c3b11a nt!AhcStreamWrite+0x6a
fffff880`0e104b70 fffff800`38c3b1c6 nt!AhcpStoreKeySave+0x9a
fffff880`0e104ba0 fffff800`38c39dfc nt!AhcStoreSave+0x66
fffff880`0e104bf0 fffff800`38c39d1b nt!AhcCacheSave+0x4c
fffff880`0e104c20 fffff800`38c39952 nt!AhcCacheWriteRegistry+0x83
fffff880`0e104c50 fffff800`389ea8e2 nt!AhcShutdown+0x26
fffff880`0e104c80 fffff800`38730951 nt!PopGracefulShutdown+0x16e
fffff880`0e104cc0 fffff800`386a03d5 nt!ExpWorkerThread+0x142
fffff880`0e104d50 fffff800`386de116 nt!PspSystemThreadStartup+0x59
fffff880`0e104da0 00000000`00000000 nt!KiStartSystemThread+0x16

I don’t have anything installed on this notebook except Google Chrome, Steam (and various games managed through Steam), and current nVidia video drivers. So, after fiddling for an hour or two and getting nowhere, I gave up on fixing things and went to repair via the new Windows 8 option, found in PC Settings->General->Refresh. Unfortunately that didn’t work – the process also crashed during shutdown, so I had to try the heavier “wipe files, except for Windows 8 apps, and reinstall”, found in PC Settings->General->Remove. I had no Windows 8 apps installed so I just did it.

Both Refresh and Remove merely required me to insert the Windows 8 DVD and do something else for a while. But if that didn’t work and I was forced to reinstall from scratch (guimode setup and format my disk, etc.) then I would have gone back to Windows 7, since it would have been the same amount of work and I had ~3 years of generally problem-free use on Windows 7. Just the occasional bugcheck coming out of hibernation.

The Remove option did work, and my Windows 8 notebook booted up with no errors and a functioning network stack. Now all I had to do is apply patches… 30 of them. *GULP*. That’s what made my system bomb last time around.

Updates
Updates

That also didn’t take too long, and before going to sleep I installed Steam and queued up the games I had installed. That went all night and the next day after work I still had a few more to grab.

  • Antichamber
  • Civilization 5
  • Defense Grid: The Awakening
  • Elder Scrolls: Skyrim
  • EVE Online
  • Fallen Earth
  • Fallout 3
  • Lord of the Rings Online
  • The Secret World
  • Space Chem
  • Thief: Deadly Shadows
  • XCOM: Enemy Unknown
  • Arma II
  • Arma II: Dayz Mod
  • Arma II: Operation Arrowhead

A friend talked me in to trying out DayZ, a mod that turns those Arma games into a zombie survival game. Normally I wouldn’t have a FPS installed since I’m not a fan of the genre.

So by Thursday night I had reinstalled those games, and then had to run/patch LoTRO, The Secret World, Fallen Earth, and EVE Online. There were various .NET Frameworks to download and reinstall, Visual C++ redistributables to download and install, DirectX to… you get the idea. On top of that, each game needed to download some amount of patches to update itself to the latest version.

Finally, I finished off with the 314.07 nVidia drivers. I hope the system runs for more than 6 months this time. 😉

Thank goodness a Steam reinstall isn’t that painful. It took a long time but at least I didn’t have to hunt down dozens of DVDs and babysit all those reinstalls.

Terminology

Syncaine asks if we need to come up with different terminology for the full spectrum of MMOs. Currently the term MMO covers a huge range, from ones that don’t require much time to ones that do, and everything in between. Syncaine’s post gives examples from bite-sized 30 minute chunks to “20hrs+, with solid 2-3+ regular hour blocks and being able to play during the prime nights (Tues, Thurs, Sunday), while also being able to schedule to play 3-4+ hours for something major”.

Thing is, activities that require 20+ hours a week, blocks during primetime, and scheduling for major events already have a name: part-time jobs. But jobs typically involve salary.

So I would suggest calling them: unpaid part-time jobs!

I jest, sort of. Syncaine points out Syp doesn’t mention other players, I would point out Syncaine does not mention actually having fun playing in his manner (he does mention that it isn’t fun to solo or be casual in DarkFall). Manning a wall for hours, loss of investment and training after somebody moves on?

Hmm… actually sounds more like a seasonal or temporary unpaid part-time job.

GW2 – Video Issues

I’ve been having some video problem while playing Guild Wars 2, and after one recent episode of 4 crashes in 2 minutes, I decided to investigate. The symptom is: game freezes (for a second or two), minimizes so I can see my desktop for a split second, then comes back. Total time “frozen” is 5 or 6 seconds, and most of the time I’m fine in-game. However, I’ve had this happen mid-fight (I come back to a character with damaged armor), I’ve had this happen as I was running in the vicinity of a precipice, etc. Sometimes the game crashes out entirely and I have to restart.

GW2 crash
Unfortunately, I’m too familiar with this dialog.

I’ve tried upgrading to the latest video driver, but it still happens. The next thing I’ll try is reducing every graphics option, hoping that a higher setting is triggering the problem.

The first thing I tried was looking in the Windows Event Viewer (Computer -> Manage; then open Event Viewer -> Windows Logs -> System). My recent 4 crashes were easy to spot:

4 crashes, 2 mins
4 crashes, 2 mins

The note about the video driver recovering really means “TDR thought the system froze so it terminated and restarted the graphics driver”.

There wasn’t too much info to glean, so I dug a bit deeper, and located the minidumps these crashes create. I found them in the c:\windows\LiveKernelReports\WATCHDOG directory, and counted them up. There were 113 of them. Yes, 113. One hundred thirteeen, over the last month I’ve been playing GW2. According to /age in game, I’ve played a total of 90 hours across all characters. So I’m averaging a video driver crash/restart every ~48 minutes. Not too frequent, but enough to drive me a little nuts.

Now, to decipher what the minidumps contain, I started up a debugger. I used windbg, a simple and free yet functional/powerful debugger. It used to be directly downloadable from Microsoft, but these days it comes bundled with another SDK or DDK.

I opened the minidump with windbg (File-> Open Crash Dump) and poked around. Since dumps aren’t live debuggable systems, it is limited to looking at registers and a stack trace, but I was very curious to find out anything more. Initially, windbg complained about bad symbols so I pointed it to Microsoft’s symbol server:


6: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

VIDEO_TDR_TIMEOUT_DETECTED (117)
The display driver failed to respond in timely fashion.
(This code can never be used for real bugcheck).
Arguments:
Arg1: fffffa8006131010, Optional pointer to internal TDR recovery context (TDR_RECOVERY_CONTEXT).
Arg2: fffff8800f36e584, The pointer into responsible device driver module (e.g owner tag).
Arg3: 0000000000000000, The secondary driver specific bucketing key.
Arg4: 0000000000000000, Optional internal context dependent data.

Debugging Details:
------------------

Unable to load image nvlddmkm.sys, Win32 error 0n2
*** WARNING: Unable to verify timestamp for nvlddmkm.sys
*** ERROR: Module load completed but symbols could not be loaded for nvlddmkm.sys

FAULTING_IP:
nvlddmkm+14b584
fffff880`0f36e584 ?? ???

DEFAULT_BUCKET_ID: GRAPHICS_DRIVER_TDR_TIMEOUT

TAG_NOT_DEFINED_202b: *** Unknown TAG in analysis list 202b

BUGCHECK_STR: 0x117

PROCESS_NAME: System

CURRENT_IRQL: 0

STACK_TEXT:
fffff880`03f1b630 fffff880`047555f7 : fffffa80`06131010 fffff880`047a1ec4 fffffa80`06131010 fffff880`04723843 : watchdog!WdDbgReportRecreate+0xa3
fffff880`03f1bb50 fffff880`047562bc : fffff8a0`242b0130 fffff8a0`242b0130 00000000`00000080 fffffa80`06131010 : dxgkrnl!TdrUpdateDbgReport+0xcb
fffff880`03f1bba0 fffff880`0472a6b3 : 00000000`00000001 fffffa80`0a3a1000 00000000`00000000 fffff880`0000004a : dxgkrnl!TdrCollectDbgInfoStage2+0x220
fffff880`03f1bbd0 fffff880`04756e0f : fffffa80`0a3ab658 ffffffff`fffe7960 fffffa80`06131010 00000000`00000000 : dxgkrnl!DXGADAPTER::Reset+0xef
fffff880`03f1bc80 fffff880`04637ec1 : fffffa80`061665b0 00000000`00000080 00000000`00000000 fffffa80`0a3ab010 : dxgkrnl!TdrResetFromTimeout+0x23
fffff880`03f1bd00 fffff800`03122e6a : 00000000`fffffc32 fffffa80`0a3a9610 fffffa80`054eeb30 fffffa80`0a3a9610 : dxgmms1!VidSchiWorkerThread+0x101
fffff880`03f1bd40 fffff800`02e7cec6 : fffff800`02ffee80 fffffa80`0a3a9610 fffff800`0300ccc0 fffff880`03f1be40 : nt!PspSystemThreadStartup+0x5a
fffff880`03f1bd80 00000000`00000000 : fffff880`03f1c000 fffff880`03f16000 fffff880`0c79ed70 00000000`00000000 : nt!KxStartSystemThread+0x16

STACK_COMMAND: .bugcheck ; kb

FOLLOWUP_IP:
nvlddmkm+14b584
fffff880`0f36e584 ?? ???

SYMBOL_NAME: nvlddmkm+14b584

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: nvlddmkm

IMAGE_NAME: nvlddmkm.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 503f8bb8

FAILURE_BUCKET_ID: X64_0x117_IMAGE_nvlddmkm.sys

BUCKET_ID: X64_0x117_IMAGE_nvlddmkm.sys

Followup: MachineOwner
---------

This just confirms the likely culprit is the driver for my graphics card, nvlddmkm.sys. The problem apparently occurs at offset 14b584 in the driver, and somebody would need source code or symbols to figure out what that is.

There really isn’t much I can do at this point, other than maybe tracking down a way to report bugs to the vendor, and otherwise wait for an update and hope that fixes my issue. But at least I confirmed the problem, reported it to Microsoft (via the error reporting service)… who knows maybe enough reports will come in that somebody will be motivated to investigate.

As far as playing, the timeout/crash/restart is very annoying, but other than several character deaths, it hasn’t impacted me too much. Typically I’m soloing or occasionally joining other guildies for a dungeon. If I were heavy into WvWvW I’d undoubtedly hate giving up a free kill every now and then due to ~5 seconds of client freezes.

I’m going to lower every graphics option, hoping that reduces the chances of this happening.