Online Experiments involving Surveys

The challenge of designing online experiments using survey software.

I’ve spent the last couple of months working on an experiment design framework in follow-up to an initial study concerning pragmatic understanding of UI designs.

Though I discovered a few frameworks out there for constructing online experiments. I’m working on several experiments involving hundreds of submissions and requiring:

I started down the path of writing my own JS front-end built on top of node.js server and mongoDB. I learned early on that I could avoid back-end development by using services from But I spent quite a time spinning up on the angular.js front-end framework because it was something I started to use for other purposes. Both Firebase and angular.js are spectacular new technologies and I wish I had more time to go down this route!

However – time is flying and progress has still been slow. Following my schedule, I set aside front-end development and moved on to learning about how to integrate with mechanical turk. And… I quickly discovered an alternative development option.

I discovered Qualtrics research suite software which allows researchers to create surveys with branching display logic and with a JavaScript API that makes it easy to embed custom JavaScript and interact with AMT. Even more lucky - my University has an account and, that by logging in with credentials, I have full access to this capability.

qualtrics survey flow

Hopefully, I will have a new blog post in a few weeks detailing useful information on how to extend Qualtrics for design of behavioral UI experiments. Thus far, everything seems doable. Qualtrics uses Prototype.js and has made it incredibly easy to interact with their Question API.

Routinizing the UI

Why is consistency in user interfaces so important?

Interacting with graphical user interfaces (GUIs) sometimes feels conversational and sometimes not. A dialogue box that asks a yes-no question feels much the same as a verbal yes-no question. And a blog post can feel very un-conversational in the sense of a lecture or commentary.

When we speak our own language about everyday matters, we speak effortlessly with little thought to how to produce an utterance and little thought to how to understand one spoken to us. It’s a bit different with written language. We endure years of school learning to read and write following style and rhetorical guidelines which we memorize and practice endlessly until we know how to recognize and avoid passive constructions - as well as generate them with practiced ease. Reading and writing text is just not as intuitive as speaking conversationally and most people require years of practice to achieve any competency.

When GUIs sprang into public view in the 70s and 80s, it because possible for people to interact with a computer without having to learn seemingly arbitrary complex commands typed into a terminal window. GUIs made sense. You could point and push buttons. An icon was a symbol that stood for something meaningful like a “program”. And there was plain old English language mixed in with icons and graphics (since GUIs originated in the United States). GUIs felt natural.

During all of the hype of GUI-based interfaces, software developers learned to adopt the notion of software patterns. The idea was to document shared knowledge of best practice of how to solve common or typical problems in software design. Software patterns is a fantastic concept for anyone learning to write software. Writing software and designing UIs has much in common with learning how to read and write text. It requires a lot of practice before you become any good at it. Software design patterns help new programmers speed up the process so they don’t have to figure out how to solve every problem encountered through brute trial-and-error.

When desktop computers became really popular - and web interaction more so in the 2000s, some software engineers began to specialize in “front-end” development. Anyone dabbling with web interaction might consult the Yahoo UI design pattern library while developing a new web application. In fact, Yahoo’s purpose for its design pattern library was to solve a business problem. They wanted a way to communicate standards across development teams in order to increase “consistency, predictably, and usability” across their site - and their brand.

Human Computer Interface (HCI) guidelines capture specific problems, examples, usage, rationales, and supporting research, standards, etc. Patterns range across stylistic conventions (e.g., page headers and footers), attentional mechanisms such as animation, navigation and organization, layout, common functions such as registration or login, and even more complex patterns such as social sharing and feedback.

In addition to useful, everyday patterns, the notion of anti-patterns and dark patterns document practice in common use which may be ineffective (anti-patterns) or ethically questionable (dark patterns) patterns.

It turns out that when we learn language, some concepts and patterns become entrenched with frequent use. So, for example, the phrase “I don’t know” is not something most of us have to think about before uttering. The grammatical construction I (subject) + do (1st person aux verb) + negation + know (infinitive) is not something you have to think about assembling before you say it. Such forms become routinized through repeated activation and use - such that both production and understanding require less cognitive effort and is more automated.

This sort of automatization is not limited to conversational speech. Routinization happens at many levels of production and understanding. For example, when you see a familiar word like “key”, pronunciation is automatic. It also interacts with syntactic chunks such that when someone says, “cat in the ___”, you anticipate “hat”.

cat in the hat Moreover, when you see a tall hat with red and white stripes, you may immediately think cat-in-the-hat, as well as “cat”, “Dr. Seuss”, and any number of related concepts. According to Pickering and Garrod (2004), priming may occur at different levels including lexical, syntactic, semantic, and situation.

Routinization isn’t limited to long-term memory. Suppose you are having a conversation with a friend. You are talking about a movie of which neither of you can remember the name. So you say, “that movie with Harrison Ford”. It would not be surprising if your friend then referred to the same movie as, “the Harrison Ford movie”. You can routinize a reference to something during the course of an interaction in order to communicate more easily.

Priming is applicable to GUI patterns. When the GUI presents a dialog box like the one below, it is familiar. It offers a choice (1) or (2). Typically, the choice is binary - cancel or accept; yes or no; permit or deny, etc. You read the text and make a choice.

dialog example 1 Image credit - (Why ok buttons in dialog boxes work best on the right)

Once you’ve seen this pattern, you don’t have to ponder over similar dialog boxes each time you encounter one. In fact, the more often you see and recognize a design pattern, the more entrenched it becomes. The same cognitive architecture that supports your understanding of language also supports and facilitates understanding of user interfaces

So lets talk about the difference between interaction design patterns and linguistic patterns. Well one obvious difference is while we use language all day long, only a few of us know how to produce UIs. And when people interact with UIs, they aren’t interacting directly with the designer. So the designer doesn’t get direct (or continuous) feedback on how well the user understands the interaction. Production is not directly linked to comprehension in a realtime feedback loop like face-to-face dialog; it is a bit more like interacting with monologue text. This means that patterns are not aligned and refined in the same way.

Interaction design patterns help improve communication, but because they are easily modified by the designer with no direct feedback about how such changes affect a user’s comprehension – there is propensity for non-obvious error.

In fact, there might be a lot of information packed into a dialog box. Dialog boxes don’t have to be simple binary choices. The UI designer can make a dialog box for any purpose. Here’s a very simple one where explicit choice is omitted. I guess I can say “not okay” by clicking the “x”. Maybe. Hmmm.

dialog example 2

Here’s one where the designer decided that a form could be a dialog box. What happens if I don’t fill something out right?

dialog example 3

This is a familiar open document dialog. Sometimes you can open more than one document but there is no way to know without trying it.

dialog example 4

Here’s a complex dialog that combines the basic cancel, accept pattern with other buttons and choices. From experience, I expect that I can do a bunch of things and then choose “OK” or “Cancel” when I’m done. But it requires a bit more thought on my part – and also a bit of trust where if I spend a lot of time on this and then mess something up, I don’t know what will happen.

dialog example 5

Here’s a dialog box where I worry that if I click on the hyperlink that I don’t know if I’m still in this dialog or I’m sent off on a wild goose chase.

dialog example 6

Below, is it more confusing if I move the “cancel” button like this? Will the user even see the “cancel” button? Will they become confused because the design conflicts with expectation?

dialog example 7

Considering the examples above, it’s easy for UI and software designers to break the simplicity of such a simple design pattern by:

Common design patterns are easily broken. Any designer or developer with a text editor can so it easily without understanding the impact on comprehension. Producing comprehensible GUIs requires as much practice as writing clear, well-structured text. Design patterns arguably serve as a cognitive aid boosting mechanisms supporting routinization and automatization of understanding. But as designers we need to be very sensitive to the effect of alteration, no matter how benign the change might seem. Clearly, we learn such sensitivity writing prose. Why not user interaction?

The cognitive basis for academic workflows

The importance of cognitive workflow in long works.

Last Fall, late into comprehensive exams, I paused to consider the effort it took to research and produce answers to questions that would only be used in a 2-hour comprehensive exam. Of course, if I was lucky, some of the writing might make it into my dissertation. And the time spent reading, thinking, and learning was potentially invaluable. But the process was painful. For each of four papers, I considered my question, amassed sources, crammed notes into Evernote, outlined, wrote, re-wrote, re-thought, re-searched, re-collected, re-assembled, re-drafted and finally spend a day inserting and formatting sources properly. That was just for a first draft.

In the end, all of my work was in disconnected Evernote notes and Word documents.

Like many other doctoral students, I came to the realization that my method was incredibly inefficient and could not scale. In fact, I realized that my largest problem was actually the ability to re-use my own thinking, knowledge, and experience.

So I read everything I could find on academic workflows.

What I found was huge discussion on the relative merits and trade-offs on organization (knowledge management) tools, brain-storming tools, note-taking, bibliography, and writing tools. Naturally, I downloaded a variety and began to explore. At first, I couldn’t see why there were so many over-lapping functionalities. For instance, various students and researchers would admit to using multiple brain-storming tools simultaneously. What the heck?! (I’ll get back to this later.)

Fairly quickly, I came to the conclusion that my best chance at settling on a workflow was to focus on studying the workflows of researchers who were very skilled. I reasoned that someone who publishes a lot probably was good at re-use of previous thinking and work. This post by Steven Berlin Johnson describing his use of DevonThink is representative.

While developing my own workflow, I ran across an article from Pirolli and Card (2005) that reminded me intensely of academic workflows! In fact, they had done a cognitive task analysis (CTA) of intelligence analysts. What they described was sense-making. In essence, sense-making is a set of processes for framing data to a model and also fitting data to a mental model (Klein, Moon, & Hoffman 2006). The first part (framing data) is posed as an “information foraging loop” while the second (fitting data to a model) as a “sense-making loop”. Another way to look at this is as an interleaving of bottom-up processes with top-down processes: search, filter, and understand versus find evidence, examine counter arguments, and re-evaluate. The diagram below is extracted from Pirolli and Card (2005). cognitive task analysis

My current workflow is represented below. In effect, it represents a simple CTA that generalizes the academic workflow that I now use. sense-making

Though I drew in arrows that indicate a sort of continuous back-and-forth process at a micro-level, there do seem to be larger foraging and sense-making loops. It seem so since I spend a day or days at a time doing focused work within specific tools. In fact, I spend a lot of time in Scrivener. Writing seems to require much more effort than note-taking in other parts of my workflow. Probably, before adopting this workflow, I spent most of my time in the foraging loop - and then losing or forgetting much of the learning I had attained. (Let’s face it, hunting is more fun.)

The major tools I use appear under the loops in the diagram above:

Not shown is a production tool. I typically export from Scrivener to LaTeX (the subject of a future post).

Another way to look at this process is in terms of note-taking. At each stage in the process, my notes become more compact and inter-connected. When I read an article, I may outline summaries and extract concepts and quotes. But when I’m sense-making, I’m primarily working in Tinderbox or OneNote and what’s important is thinking and bridging – the relationship between notes. They are typically short, but need to link back to summaries, extracts, and sources. When I write prose, I work in Scrivener. The link back to intermediate sources (Tinderbox & OneNote) and original sources (Pages & DT) is important to maintain. Sometimes there is the need to go back and work on material in more depth. But, in the end, all of these intermediary products are re-usable.

Now, back to the question of overlapping functionality. Why is it that, at least on the Mac, there is so much overlap in functionality between “academic workflow” tools? Why would I want to take notes in multiple tools? It turns out that note-taking is a multi-faceted beast. Depending where I am in the sense-making process, I’m taking different sorts of notes. For example, below are different sorts of notes serving different sorts of purposes - which may be used in different parts of the sense-making loop. I adapted the scheme below based on an analysis of historians using DT.

tabular system of note-taking

Labels, tags, notes (summaries, maps, highlights, etc.) are all really just notes. We just use them in different ways. And some are obviously more compact than others. As Mark Bernstein of remarks:

(These remarks can be found in Bernstein’s talk on the youtube video link at the bottom of this post.)

There are also other small workflows which impact productivity in a cumulative sense. One of the most valuable has been to create a small workflow which is triggered whenever I save a PDF to the desktop. When this happens, a hazel script immediately imports the document into Papers2. This causes me to source the document right away so that when I need to reference a source downstream, its already entered into Papers2.

In the end, I hope to capture my own knowledge and experience over time so that I can build on previous work without having to re-think and re-organize every time I start a new paper. It’s an experiment in progress. FYI - I still use Evernote. But more for random odds-and-ends such as notes on programming and how to keep my blog running smoothly.

Here are some of the references online that I found particularly inspiring for my own workflow. Interestingly, I found lots of useful information about DT from historians, while historical fiction writers seem more excited by tools like Tinderbox. Scrivener, however, definitely crosses boundaries. Apparently, all of us are deeply concerned with writing and production. Enjoy!

Klein, G., Moon, B., & Hoffman, R. (2006). Making Sense of Sensemaking 2: A Macrocognitive Model. Intelligent Systems, IEEE, 21(5).

Pirolli, P., & Card, S. (2005). The sensemaking process and leverage points for analyst technology as identified through cognitive task analysis. Presented at the Proceedings of International Conference on Intelligence Analysis.

Perception of the world influenced by language

Does language affect our perception of color?

Color grid It is interesting to think that the we still have much to learn about the most studied part of the brain - the visual cortex. Given this, how is it that we understand so little about such a basic neurological phenomenon as “visual integration”. By this, I mean how do we take basic signals coded as color, shape, orientation, motion and make meaning of these?

A persistent, yet controversial concept, in linguistics is known as the Whorf-Sapir Relativity Hypothesis. This hypothesis holds that language itself affects how we see the world: our perception of the world is influenced by language. Though over the years, cognitive psychologists and others have made reference to Whorf-Sapir in a variety of studies, I had not seen any neurophysiological support for this theory until recently. In fact, there is plenty of evidence to suggest that humans all perceive basic visual stimuli similarly. So how could language possibly affect perception?

Color categorization has long been of interest to linguistic anthropologists and cognitive psychologists. Berlin and Kay (1969) conducted a cross-linguistic study of color categorization discovering that color categories are not uniform across languages. There seemed to be focal, or prototypic colors, but the number varied depending on the language. Also, boundary members varied in terms of categorical membership (that is, whether a particular color was classified as blue versus green). However, in their study, speakers of languages that classified colors differently, did not perceive colors differently. Berlin and Kay hypothesized a set of universal constraints and, thus, some prediction for color categorization. Their study also seemed to suggest strong evidence against the relatively hypothesis: linguistic categories do not affect or change perception. (Also, see Kay and McDaniel (1978) which is available as a downloadable PDF.)

In intervening years, a number of researchers have countered Berlin and Kay’s theory, primarily on the basis of methodological flaws. More recently, however, a 2009 (Siok, et al.) functional MRI study has lent credible biological evidence that language plays a role in the categorical perception of color. The faculty for language is found in the left hemisphere of the brain. But by looking at brain activation maps across both hemispheres in a visual search task, these researchers discovered that enhanced activity in the right visual field coincided with enhanced activity in the part of the left hemisphere concerned with lexical processing. Results suggest that language may provide some top-down control modulating the activation of the visual cortex.

In 2012, Loreto, Mukherjee, and Tria published a paper in PNAS demonstrating through a multiagent simulation how a population, subject to the perceptual constraint “Just Noticeable Difference”, categorized and named colors through a purely cultural negotiation. They claim strong quantitative agreement with the World Color Survey (WCS) pioneered by Berlin and Kay.

In the end, the question of whether language affects perception does have foundation in neurophysiological processes. Perhaps, not to the extent originally conceived by linguistic relativists. But maybe there is additional supportive evidence not quite so dependent on basic visual stimuli.

Berlin, B. & Kay, P. (1969). Basic Color Terms. Berkeley: University of California Press.

Kay, P. and McDaniel, C. (1978). The linguistic significance of the meanings of basic color terms. Language, 54(3):610-646.

Loreto, V., Mukherjee, A., & Tria, F. (2012). On the origin of the hierarchy of color names. Proceedings of the National Academy of Sciences, 109(18), 6819–6824.

Siok, W., Kay P., Wang, W., Chan, A., Chen, L., Luke, K., & Tan, L. (2009). Language regions of the brain are operative in color perception. Proceedings of the National Academy of Sciences, 106(20), 8140-8145.