Lessons from a Failed Experiment in JavaScript Accessibility

May 18, 2015

Photo credit: daveynin

It’s quite common for password fields to have a “Show Password” function — a checkbox or button which temporarily converts the field to plain text, so that users can see the password they’re typing.

However it occurred to me that this might not be secure for screenreader users, because a person who’s blind might not know if someone else is looking at their screen (for example, in a semi-public space such as an office or conference room).

Password fields are obfuscated for screenreaders just as they are visually — the screenreader will say star or similar for each typed letter, rather than announcing the letter itself. Just as it is for sighted users, the screenreader user has no interactive feedback to confirm that they’ve typed their password correctly.

So I wanted to explore the possibility of scripting this functionality, so that a screenreader user could hear what they’re typing while the field remains visually obfuscated. Then a user who’s listening with earphones would have the security of knowing that their password is not being revealed to anyone around them.

It didn’t work!

But along the way, I was reminded of some home truths about accessible JavaScript, and of an unfortunate problem with ARIA live regions which has wider implications for accessible scripting in general.

The First Prototype

The basic idea I had in mind was quite simple: an element with aria-live="assertive" would be added to the page, then whenever the password field was edited, its value would be copied to that element. Since it’s a live region, the new value would be immediately announced by screenreaders, but it would also be hidden with off-left positioning and wouldn’t be in the tab order, so it wouldn’t be otherwise apparent.

Here’s the code for that (or view the first prototype on its own page):

If you’re using an ARIA-supporting screenreader, the Hear Password button should enable this functionality and immediately speak whatever value is already in the field. Then while it remains enabled, the screenreader will announce the value as it’s typed (as well as saying star).

If you’re not using a screenreader, you won’t hear anything. Although it would be possible to implement direct text-to-speech using something like meSpeak.js, I don’t think that’s a good idea — because anyone who needs this functionality already has a screenreader.

More to the point, they have a screenreader in which they control the voice and speech rate; external TTS would just get in the way.

The following video demonstrates what a screenreader user would experience (using Firefox + NVDA):

Now one problem that’s immediately obvious is that copying the whole value to a live region means that the value is treated as a word; each time the user types a letter, the entire value is announced, rather than just the letter they typed. This is quite different from the behaviour of a standard text field, where the spoken feedback tells you the individual letters as they’re typed.

In that example I typed a password that could be interpreted as a word, but what if the value is more convoluted, something like "NCC-1701"? Well, in that case NVDA will says n c c one thousand seven hundred and one, which is problematic for another reason: it didn’t tell us that the first three letters were capitals rather than lower-case, and it didn’t announce the dash at all. The situation is slightly better in Jaws, which does speak the dash, but it still doesn’t mention the capitals.

This is a pretty fundamental problem. If your password happens to be a single lowercase word, then this solution would work quite well, but if it’s a mixture of special characters, then you might not get to hear the whole password at all. You might, for example, think that you typed "ncc1701" rather than "NCC-1701", and the difference is obviously significant when we’re verifying a password!

And that’s not even a particular difficult example. What if your password was something complex like "goHuE&-A" — NVDA would pronounce that as go hooey and a, which is really not helpful at all! Or what if your password was something like "gate", but you actually typed "gaite" — the spoken feedback would be exactly the same in both cases.

The Second Prototype

Maybe we could improve this by providing more detailed feedback. Instead of speaking the whole value whenever the value changes, we could speak the individual letters in response to keypress events.

Here’s the modified code for that approach (or view the second prototype on its own page):

And here it is being used in Firefox + NVDA:

In one sense it’s better, since it provides more detailed feedback as you’re typing, but in another sense it’s worse, because it limits your input speed. Each letter still says star before it speaks the actual character, so you have to wait for that to happen in order to hear the character; if you type more quickly, then you don’t hear any of the letters until you’ve stopped, at which point you only hear the last one (as demonstrated in the video).

So all this updated prototype does is solve one problem by creating a different one.

No More Prototypes!

Maybe there are ways we can improve it. Perhaps we could split the value by spaces, for example, to output "h e l l o" instead of "hello", so that the individual letters are spoken instead of the whole word. That would work for letters, but it wouldn’t address the problem of not speaking capitals or punctuation, and would only compound the problem with passwords that contained spaces.

In any case, we’re well into the realms of ugly hacking now! Sometimes, it gets to the point where more hacking just leads to more hacking, and we have to step back from what we’re doing, and ask — was the whole thing a bad idea? Unless we can make it respond just like a regular plain text field, we’re never going to have an intuitive solution.

And as watchwords go, that’s pretty useful. If we can’t create an intuitive solution — if the only way to do something is to endlessly hack around the minutiae of a kludgy solution — then maybe it wasn’t worth doing in the first place.

Is This Thing Live?

But there’s a bigger problem to contend with that makes it all rather irrelevant, which is that it doesn’t works in IE + Jaws; since that’s the most popular combination, failing to work there is a pretty big deal.

Here’s the first prototype again being used in IE + Jaws:

You can see how pressing the Hear Password button simply announces the updated state of the button, it doesn’t read the value in the live region. The reason for this comes down to the behaviour of aria-live="assertive".

The aria-live attribute has three possible values, that determine how assertive the message should be:

"off" means the change in content is not announced.
"polite" means the change should only be announced when the user is idle (i.e. waiting until the screenreader stops whatever it’s speaking at the time)
"assertive" means the change should be immediately announced (i.e. interrupting whatever the screenreader is speaking at the time)

But recently, Russ Weakley and I did some testing with ARIA live regions in preparation for a forthcoming Learnable course, and what we discovered is so significant that it deserves a bold paragraph all on its own:

Assertive live regions are not usually assertive.

In other words, most browser/screenreader combinations don’t actually honour the "assertive" value, they treat it the same as "polite". Specifically — only VoiceOver with Chrome or Safari will actually interrupt the user to announce an assertive message; all other tested combinations (including NVDA or Jaws with any browser) won’t announce it until the screenreader is silent.

What we have in these prototypes then, is an assertive region being treated as polite. The value did get announced in Firefox + NVDA because clicking the Hear Password button doesn’t trigger anything else — NVDA doesn’t read the updated button text, and since nothing else is being spoken, a polite message can be announced. However in IE + Jaws, clicking the button does cause it to read the updated button text, which means that the reader is not idle, and therefore the live region isn’t announced.

Conclusion

None of this should be taken as a reason for not implementing standard Show Password functionality (i.e. that simply switches the field type). Those solutions can be accessible to screenreaders, the only concern is the potential security/privacy risk they present.

But if the alternative is not to provide any means for any users to see their password, then it’s arguably better to do that, than to do nothing. (Though there might be a usability case for saying that password fields simply shouldn’t be masked at all, but such questions are beyond the scope of this article.)

We could perhaps mitigate the risk by explaining it for screenreader users, which we can do by adding aria-label to the triggering button. When a button or link has aria-label, a screenreader will speak the label text instead of the element’s content — so it must at least contain the same information, but can also have supplementary information that’s only relevant for screenreader users (and this technique has a whole variety of uses):

<button aria-label="Show password as plain text. Note: this will visually expose your password on the screen.">
  Show Password
</button>

Ultimately though, what’s important to take from this experiment is a basic truth about accessible scripting: if it can’t solve a problem in a usable and intuitive way, then it hasn’t solved the problem.

Accessibility is not an exercise in meeting abstract checks, it’s about improving usability for people with specific needs. So a solution that’s technically accessible but is not usable for these groups of users, can’t really be considered accessible at all.