Hapax Legomenon

Robin Williams, suicide, and “getting it”

Posted on August 13, 2014 by nkrishna

Robin Williams, who starred in many formative movies of my childhood and released comedy that had my high-school friends and I in stitches, took his own life this week. He’d struggled with alcohol and drug addition, and severe depression, and in the end decided to end it.

I don’t get it, and that’s why it’s scary.

I mean, intellectually I get it. I can see the reasoning. Though I don’t personally suffer from depression, I know people who do and have seen its effects. I’m told and have witnessed how the world seems to turn gray and the usual pleasures seem empty and how although you know at the back of your mind that you’re loved, you don’t see it. It sounds like you’re not sad—you just don’t care. You don’t feel anything, so in the end, taking that final step seems like it might make you feel something. It’s actually a quite logical line of thinking to an outsider, so in that sense I understand it.

But I don’t “get” it.

I doubt I ever will. It’s one of those experiential things that if it hasn’t happened to you, the closest you can get it second-hand relation.

Because it’s an illness of the mind, and when the thing that normally diagnoses problems is the thing with the problem, it can’t see the forest for the trees, so to speak. The mind can’t ever be truly outside itself, so can’t see it’s whole self. The best you can do is seek the help of another mind, but even that’s an imperfect solution made harder by social and internal obstacles.

So we end up with wealthy, successful, brilliant Robin Williams, who apparently had everything but a way out.

At least, that’s how it seems to me. I’m probably wrong.

Posted in Media, Personal | Tagged mental illness, robin williams, suicide | 2 Comments

Book Review — The Horse, the Wheel, and Language

Posted on August 9, 2014 by nkrishna

Author: David W. Anthony
Publisher: Princeton University Press

My first love in linguistics is probably the historical variety. But no one will pay me money to do that so I do all this computational stuff instead. Still, historical linguistics and me go way back, especially Indo-European linguistics, and I try not to miss an opportunity to read more on the topic.

It’s a topic that is, to put it mildly, fraught, both with uncertainty and hotly contested theories and with a spotty history and an unfortunate legacy brought on by romantic nationalism and social Darwinism. All this still leaves us asking, over 200 years after Sir William Jones kickstarted both comparative linguistics and Indo-European studies, can the question of Indo-European origins be answered at all? David W. Anthony, a professor of anthropology at Hartwick College in upstate New York, says “yes.” Let’s watch.

This book attempts to relate a fairly comprehensive history of the origins and spread of the Indo-European language family, from the Neolithic all the way to the early Iron Age. Along the way, the author tackles a number of thorny questions regarding the question of Indo-European origins: What is a language family? When does a language become distinct? In what period can we rightly call the language “Indo-European”? And of course, where was it spoken? And by whom?

Anthony favors Marija Gimbutas’s Pontic-Caspian steppe homeland for the original Indo-European speakers, but eschews the dubious essentialism that plagued Gimbutas’s later work (basically, she ended up forwarding that violent, chauvinist steppe riders rampaged through and demolished the peaceful, matriarchal society of Old Europe, imposing a new order by fiat, which is a rather simplistic view, and one not very well-evidenced). As the Kurgan hypothesis is the one that makes most sense to me, personally, I like that Anthony goes with it, but even for skeptics, he makes a solid argument, weaving together both the linguistic evidence and the archeological, and so unless you’re a hardcore Anatolian hypothesis person, you’ll probably find it convincing on at least some level.

The book is roughly half linguistics and half archeology. On one hand, you get the usual spread of reconstructed Indo-European terms for tools, pastoralism, and ritual and what they imply about the society that used them, and you also get lots of tables and charts and discussions of pottery and artifacts. Maybe you’re not particularly interested in pots, but Anthony gives a comprehensive archeological history of the Pontic-Caspian steppe from about 6000 BCE through to the late 2nd millennium BCE, complete with the Pre-Indo-European cultures and their neighbors, and by the time we really get to the Indo-Europeans proper around page 300, you’re left realizing that the origins of the language family, its speakers, and their culture make no sense without the surrounding context. Like all people, the oft-romanticized Indo-Europeans didn’t arise in a vacuum—they were as much shaped by their neighbors and their environment and the vagaries of chance as anyone else.

To someone already familiar with most of the linguistics and cultural stuff (you get the basic intro to things like the comparative method and how culture can be inferred from vocabulary), the archeology portions of the book, though dense, were probably the most enlightening part. There have been a lot of prehistoric graves ~~robbed~~ excavated in the service of Indo-European reconstruction, and it seems like in these pages, we pay a visit to nearly all of them.

We meet all our favorite Indo-European language groups, though special attention is given to Germanic, Greek, Anatolian, Tocharian, and Indo-Iranian, such that Celtic, Italic, and Balto-Slavic feel like they’ve been given short shrift. Anthony seems to have his reasons, though, in that he focuses on what he calls “persistent [archeological] frontiers” associated with a particular language group. It is his self-appointed task throughout the book to make the association between language and archeology, despite the seemingly insurmountable stumbling block that pots can’t speak and words leave no bones. Yet if you begin this book thinking that bit-wear on horses’ teeth has nothing to say about a society or that the vast steppes of prehistory were just empty wasteland full of barbarians, these notions are quickly and effectively dispensed with. We see funeral rites compared with texts like the Rig Veda and Avesta, we see the apparently male-centered culture of the eastern Indo-European frontiers contrasted with the greater female representation in the graves of the western frontiers and how that’s reflected in the related mythologies of the associated groups (e.g. Valkyries: female, Maruts: male), we see how the mineral resources of the Eurasian steppes opened up trade with the more famous and settled civilizations of Sumer and the Indus Valley, allowing for Anthony’s model of patron-client relationships between Indo-European steppe-dwelling chiefs and local fiefdoms, allowing for the spread of their language, the ancestor of this one and over 400 more spoken by half the globe today. Throughout, it becomes clear that the same mechanics of privilege and prestige that drive language politics today functioned essentially the same way on the steppes of Ukraine 5000 years ago.

There is the occasional spelling error and since a lot of names given are of places in Russia and Ukraine, if they’re spelled differently in different places I can’t be 100% sure which is the right one. But overall, they don’t distract and the book is readable without losing any of its technicality. Though often prone to speculation (what else can you really do when talking about a culture that’s been dead for 4000 years?), it’s always backed up by well-argued facts and evidence related in prose that’s about as flowing as one can expect to get from an academic archeology book. Anthony is a good writer, sometimes funny, sometimes lyrical. One consistently gets the feeling that he’s awed by the Central Asian steppe environment he writes about.

This is not a narrative, nor is it a pop-linguistics books. You don’t have to be an archeologist or an anthropologist to read this, but it probably helps, and the synthesis of evidence from those two fields, coupled with linguistics, is compelling and a well-thought out effort. While we may never know the names of the horse-riders of the Eurasian steppes, we have evidence of their deeds, and their words are still with us, which is ultimately, as Anthony shows, a powerful tool in unraveling the Indo-European knot.

Posted in Books, History, Language, Science | Leave a comment

On Linguistics and Public Outreach

Posted on March 24, 2014 by nkrishna

Earlier in the week A couple weeks ago, I posted three questions about science communication in the field of linguistics:

Who is the Carl Sagan or Neil DeGrasse Tyson of linguistics? In other words, who do we have in linguistics who is an effective presenter of the ongoing work in the field?
Is there one? (Implicitly, do we need one at all? Or why not many?)
Leaving the question of specific personality(ies) aside, how do we as linguists (of any stripe) better present the work we and our colleagues do for effective public consumption?

I got a number of good responses, so here’s a summary along with thoughts about each. These are my personal opinions, so feel free to disagree—vehemently if you like.

Continue reading →

Posted in Language, Media, Science | Tagged communication, cosmos, linguistics, pr, science | Leave a comment

Statistics Aren’t Enough

Posted on January 17, 2014 by nkrishna

There’s a cute quip: statistics don’t lie.

There’s an accurate quip: statistics don’t lie, given a well-trained model and a large enough sample.

With statistics on your side, you can be 99% sure that you’re right. Unfortunately, people who are 99% sure are wrong 40% of the time.

I was recently thinking about the way Hindi and Urdu represent possession. The language doesn’t have a verb “to have,” and so uses three different constructions depending on the type of possession and the object possessed.

Long story short, if the possessed object is physical and alienable (possession is impermanent), you use ke pās, or “near.”

Rajesh ke pās ek kitāb hai
Rajesh.GEN-near one book is
“Rajesh has a book.”

If it’s nonphysical and alienable, you use ko, or “to.”

Rajesh ko ek bukhār hai
Rajesh-to one fever is
“Rajesh has a fever.”

Finally, if it’s inalienable (things that always belong to you, like body parts or relatives), you use a copula construction.

Rajesh kā bhāī hai
Rajesh-of brother is
“Rajesh has a brother.”

Now, let’s say you were trying to learn Hindi using the state-of-the-art machine translator, Google Translate. How would it treat you when it comes to possession?

The answer, it turns out, is not so well.

Input: “Rajesh has a fever.” Output: “Rajesh is a fever.”

Input: “Rajesh has a book.” Output: “Rajesh is a book.”

Input: “Rajesh has a brother.” Output: “Rajesh is a brother.”

Clearly, Google Translate is missing something here–the entirety of the possession entailed by the English “have.” But watch what happens below:

Input: “I have a fever.” Output: “I have a fever.”

Yay! So maybe it works with “I”, but not a third person?

Input: “I have a book.” Output: “I is a book.”

Okay, maybe not?

We get a clue from “I have a fever.” Notice how below the translation, it notes it as a phrase? “I have a fever” is a common enough phrase that it’s showed up in some Hindi-English corpus that Google Translate trained over, so it “knows” the phrase “I have a fever,” and that it corresponds to mujhe ek bukhār hai (mujhe is a contraction of mujhko, “to me”).

It turns out the ko construction is used in a number of common phrases regarding feeling (including sickness) and emotion:

mujhe is kitāb se pasaṃd hai = “I like this book”
mujhe tumse pyār hai = “I love you”

All these kinds of phrases are common enough that Google Translate has surely seen them in a corpus. It can scan along its input, find familiar phrases and then replace them with their equivalents: “I have a fever,” she said becomes “X,” she said and since the translator knows that there’s a high probability that X as a block translates to mujhe ek bukhār hai, it just swaps it in, and is left with “mujhe ek bukhār hai,” she said, and only two words left to translate. This is called “phrase-based” translation, which Google Translate uses to take a lot of work out of its translation task, breaking inputs up not into the individual words, but into bigger chunks that it can translate wholesale.

Which brings us back to the “Rajesh is a book” problem. “Rajesh has a book” is not a common phrase that the translator algorithm can just swap in, so it tries breaking it apart into chunks: maybe “Rajesh” and “has a book.” The name goes through all right, but as we’ve seen, “has a book” is not a phrase that has an easy Hindi equivalent in isolation. You have to do some stuff to the possessor as well, which is no longer part of this chunk. The same thing happens if you split it differently: say, “Rajesh has” and “a book.” It can translate “a book” just fine, but “Rajesh has” is now a problem, because it can’t translate “has” without knowing what is possessed, and “a book” is outside the chunk being examined.

What it might do is break it down until it gets translatable portions, and so ends up with “Rajesh”, “has”, and “a book.”

Rajesh ⟶ Rajesh
a book ⟶ ek kitāb
has ⟶ NULL

“has” goes to NULL, because there’s no direct translation. It might look in its knowledge and see a lot of “has” sentences in English that end in hai in Hindi. Knowing that Hindi sentences often end in the verb (tree-based translation), it seems like there might be reason to translate “has” as hai, which it does, though perhaps not very confidently. has ⟶ hai, and Rajesh is a book.

But what if we were to invoke some kind of semantic category? Turn “a book” into, say ALIENABLE PHYSICAL OBJECT. Seeing that, we could train the translator to know that “has ALIENABLE PHYSOBJ” should translate to “ke pās ALIENABLE PHYSOBJ hai.” Suddenly “has a book” has a distinct translation, because we know what kind of thing a book is. The process would then look something like this:

a book ⟶ ALIENABLE PHYSOBJ (store “a book” somewhere so you can get it back)
has a book ⟶ has ALIENABLE PHYSOBJ
Rajesh ⟶ Rajesh
has ALIENABLE PHYSOBJ ⟶ ke pās ALIENABLE PHYSOBJ hai
ke pās ALIENABLE PHYSOBJ ⟶ ke pās a book hai
a book ⟶ ek kitāb

It takes a little longer and you have to have this extra semantic layer in the middle, but you get it right in the end.

Basically, the machine currently fails at this task because we haven’t cracked the question of meaning yet. The computer doesn’t know that a book is a physical object that can be given away or that a fever is an impermanent affliction or that your relative will always be your relative. That requires a human to go in there and annotate those things as such. The holy grail of machine translation is to be high quality (correct), general domain (you can talk about anything), and machine exclusive (you don’t need a person to either format things before it’s translated, or to fix it up afterwards). So far, we can usually hit two out of three with the most advanced computational linguistic techniques. Statistics do very well in some cases, especially between languages where there are very large parallel corpora that allow things to be restricted to a big but closed set of phrases and word chunks. However, this is not a general solution, and has to be tweaked for each language pair, and we usually still have to have a human in the loop to come in and clean things up, so that poor Rajesh isn’t a book.

But we’re working on that.

Posted in Language, Science | Tagged computational linguistics, linguistics, statistics, translation | Leave a comment

A Big-Ass Chart of Indo-European Languages

Posted on December 1, 2013 by nkrishna

Other people make and eat massive amounts of food over the Thanksgiving weekend. I did that, too, but I also made a chart. A big-ass chart.

There are many ways to chart a language family. You can do the traditional tree view showing genetic relationships. You can show a map of the distribution of various languages and subfamilies. You can view each one individually in terms of its internal history.

Inspired by a brief discussion on Tumblr, I attempted to do all three. Someone asked what Germanic languages were contemporaneous with Latin. I found out the answer, but not before having to read through three Wikipedia articles. I started thinking about ways to represent such a question graphically and came up with this (previews below).

Done in LaTeX, the linguist’s best friend, I attempted to capture the genetic relationships between the various Indo-European languages, the relative locations they occupied in the Indo-European sprachraum (at least before the age of exploration), and the time periods in which each language flourished.

How to read this chart:

Languages on a red background are extinct.
Languages on a green background are still extant.
Languages on a yellow background have no native speakers but are still in use as liturgical or scholarly languages in certain traditions.
Read from left to right, the chart shows the languages from west to east based on the center of each one’s speaking area (i.e. Celtic is the westernmost subfamily, Tocharian the easternmost). If two languages occupied the same longitudinal area within their subgroup, the more northerly one is on the left and the more southerly one is on the right (thus Baltic is to the left of Slavic, since they occupied more or less the same east-west area, but Baltic is on average more northerly).
The date at which the language first appears on the chart is the date at which linguists hypothesize that language existed as a distinct idiom. This may be the same as the date of first attestation, but is not necessarily, especially with the older languages. With some of those, the date is rather speculative.
When a language’s children appear on the chart, you can assume that the parent language went extinct at about that point. If a language had no living children, the line extends downward from that language to the point of extinction.
The time-scale is very loosely logarithmic; centuries pass much slower as you get to the bottom of the chart.

I chose Indo-European to try this on as it’s the language family I’m most familiar with (I think all but one of the languages I know much of anything about are Indo-European), and because most branches and subfamilies have been well-placed in space and time at this point. I assumed the Pontic-Caspian hypothesis of Indo-European origins; quite simply, it’s the one that makes the most sense to me and the date it gives also allowed the time scale to look reasonable in chart form.

This chart doesn’t include all the Indo-European languages–that would take forever and many of the smaller ones we don’t know much about (the Slavic family alone would take weeks to catalogue), but I think you can get a good picture of when and where diversifications happened within the Indo-European family of languages.

Full version here: 8267×1426 px

UPDATE:

Fixed a few errors: added Romani, Romansh, fixed left-right direction of Eastern Iranian branch, status of Classical Armenian, spelling errors

Posted in History, Language | Tagged chart, indo-european, language family, linguistics, tree | Leave a comment

Book Review — Holy Sh*t!: A Brief History of Swearing

Posted on November 3, 2013 by nkrishna

Author: Melissa Mohr
Publisher: Oxford University Press

Every once in a while you come across a book on a bookstore shelf that catches your eye simply because it’s got a dirty word right in the title. So it was with me and this one, which is really quite a good data point in favor of one of the book’s primary arguments: that swear words and taboo language speak to us on a fundamentally different level from other language.

Though brief, this history of swearing is pretty comprehensive, at least for English. Mohr (whose author picture on the back jacket shows her with her young son–that must have been fun trying to explain mom’s latest project), divides swears and cursing into two categories, the Holy (“Oh my god!”, “Sweet Jesus”, “Damn you”, etc.) and the Shit (shit, fuck, asshole, and the like). If you found yourself reacting more strongly to the latter than the former, that’s because, according to Mohr, we’re living in the age of the Shit right now. Which is to say that language referencing obscenity and bodily functions has greater taboo force than the other kind of historically offensive language, language that references theology.

Continue reading →

Posted in Books, Language | Tagged book review, holy shit, language, swearing | Leave a comment

Priorities

Posted on October 30, 2013 by nkrishna

Apparently we Gen-Yers care more about “a fulfilling career” than “a secure career”.

Cal Newport points out that “follow your passion” is a catchphrase that has only gotten going in the last 20 years, according to Google’s Ngram viewer, a tool that shows how prominently a given phrase appears in English print over any period of time. The same Ngram viewer shows that the phrase “a secure career” has gone out of style, just as the phrase “a fulfilling career” has gotten hot.

Images follow. Click the link to see them.

While the article is kind of ambivalent about what this statistic means, many of the comments regard the idea that fulfillment be valued over security with a significant amount of derision. Yeah, I know, never read the bottom half of the Internet, but why is the idea that seeking fulfillment in your livelihood is somehow bad (or at least to be valued less than security) so prevalent?

This is interesting to me now, because reading that article came at a time when my life is at a confluence. To wit:

I have two jobs–software engineer and graduate student. Engineer pays pretty well, grad student pays marginally above the poverty line. However, combining those, I’m living pretty high, easily within the top quarter for income-earners in the United States. In terms of actual work content, my engineering work I can take or leave, though I try to do the best I can at it. It’s just often not particularly exciting and sometimes very frustrating. My doctoral research, on the other hand, gets the blood flowing like nothing else besides writing (which, not coincidentally, is a large part of a Ph.D. program). Engineering is a secure career–people are always going to need halfway-literate code monkeys, and will be willing to pay them well. Research is a fulfilling career–I get to meet people who are excited to learn about the work I’m doing, have opportunities for great collaborations, and can end up adding something concrete to a growing and exciting field.

This can’t last.

Continue reading →

Posted in Personal | Tagged personal, work | Leave a comment

Burgers, metal, and religious offense

Posted on October 4, 2013 by nkrishna

Really, really kicking myself for never visiting Kuma’s Corner when I was living in Chicago.

Continue reading →

Posted in Music, Religion | Tagged food, ghost, kuma's corner, metal, pz myers, religious privilege | Leave a comment

Pisa Wrap-up

Posted on September 30, 2013 by nkrishna

I haven’t blogged in a while which is nothing new (not that anyone reads this), but this time I have a reason beyond my own laziness: my doctoral advisor invited me to a conference in Pisa, Italy, which is where I was for the past week.

This was my first time in Italy and only my second in Europe outside of an airport. The conference was great. Some very interesting work was presented and I met some people who were quite interested in my doctoral research, despite the fact that I haven’t really started doing it yet. Some of the best parts about conferences are when you just get to talk to people on an informal level and hear their thoughts about a field that you already have in common. There’s some really exciting and diverse research going on in computational linguistics right now.

I tried to make the most of sightseeing while I was there. Most of that revolved around the Leaning Tower and the Cathedral and Baptistry in the Piazza dei Miracoli, which was just a short walk from my hotel. Even for an atheist, religious structures can be impressive in their own right.

The acoustics of the Baptistry are a remarkable feat of engineering where even something as simple as a toe tap echoes like a gunshot.

And of course, the Leaning Tower of Pisa has the association with Galileo’s alleged gravitational experiments, which makes it interesting from a scientific perspective.

19th century plaque commemorating Galileo’s supposed experiment

I also took a short trip out to Lucca, a medieval walled city about a half hour from Pisa. I’m particularly happy about this because I was able to negotiate my way there, to a tower, a bookstore, through lunch, and back to the bus station, all in Italian, which I had only started learning a week before.

Panorama of Lucca from the top of the Torre delle Ore (clock tower), looking west

Pisa and the area is great, and I would highly recommend it to anyone.

Posted in Personal, Research | Tagged academia, grad school, italy, travel | Leave a comment

Hapax Legomenon

Robin Williams, suicide, and “getting it”

Book Review — The Horse, the Wheel, and Language

On Linguistics and Public Outreach

Statistics Aren’t Enough

A Big-Ass Chart of Indo-European Languages

Book Review — Holy Sh*t!: A Brief History of Swearing

Priorities

Burgers, metal, and religious offense

Pisa Wrap-up

Recent Posts

Recent Comments

Archives

Categories

Meta