The Science Thread

Warning: Massive wall of text ahead.

Whew! I’ve now read that piece well enough to begin to have a handle on it. (With the caveat that I’ll never have a handle on the technical aspects of the speech analysis.)

I happen to know a little about speech-processing research, and these authors have taken a novel approach to the subject. If I may summarize: They note the existence of a number of statistical ‘laws’ concerning written speech, three of which they explored in this paper:
Zipf’s law, which concerns word frequency. This law states that, for a piece of written prose of substantial size, the most common word occurs twice as often as the second most common word, three times as often as the third most common word, etc.
Heaps’ law, which states that 1) the number of unique words in an essay is a power function of the total word-count of the essay, and 2) the exponent of that power function is a fraction. Thus, the function is negatively accelerated, monotonic, and asymptotic (ie, it rises rapidly at first, then levels off until it is almost flat). This is a fancy way to say that, relatively speaking, the longer an essay is, the fewer new words it has. (As an aside, Zipf’s law is a power function as well.)
–The brevity law, which states that the length of a word is inversely correlated with its frequency-of-use (ie, words we use a lot tend to be shorter than words we don’t use as often).

All three of these are what are called scaling laws, meaning they describe how one aspect of a phenomenon changes (scales) along with another aspect of the phenomenon. Scaling laws are ubiquitous in nature. What is of particular interest is when a scaling law for one phenomenon is found to apply to an entirely different phenomenon, as this implies the existence of some sort of underlying ‘universal law.’ For example, Zipf’s law has been found to apply to many other rankings unrelated to language, such as the population ranks of cities in various countries, corporation sizes, income rankings, and ranks of the number of people watching the same TV channel. (Full disclosure: I cribbed these examples straight from the Wiki article on Zipf’s law.)

Getting back to language-based communication, here’s the rub: All of these laws were identified in, and thus known to apply to, written communication. What was not known is whether they also apply to oral communication (ie, speech). Now, it might seem trivially simple to test whether they apply to speech production—just get a written transcript of speech and analyze it. Easy-peasy, right?

Unfortunately, it’s not that simple. The problem is one of segmentation—identifying where one word stops, and another word starts. In written communication, this is trivial—we put a space between the last letter of one word and the first letter of the next. But speech is not composed of letters; rather, it is composed of phonemes. Phonemes are the ‘building block sounds’ of speech. Every spoken language has its own set of phonemes (some languages have certain phonemes in common, of course) that it uses to construct every spoken word in its lexicon. For example, /b/ and /p/ are phonemes in English; they are the reason the words /b/at and /p/at are different words to our ears. That may seem obvious—how could anyone not hear ‘bat’ and ‘pat’ as different words?—but this obviousness reflects our native-language bias. There are other languages that do not employ these sounds as separate phonemes, and thus speakers of those languages have a great deal of difficulty hearing bat and pat as different words. This becomes readily apparent when you consider certain stereotypic word confusions; eg, a native French speaker who pronounces Walk the dog as Walk zee dog (phonemically, French doesn’t separate /th/ and /z/ the way we do), or a native Japanese speaker who pronounces Fried rice as Flied lice. As native English speakers, we find some of the glottic phonemes in Arabic perplexing and hard to discern, and are even more baffled by the ‘clicking’ phonemes employed in certain African languages.

OK, so what? What does this have to do with analyzing speech for evidence of Zipf’s law, Heaps’ law, etc? As I said, the issue concerns segmentation. While a native speaker segments speech effortlessly, it turns out that, when analyzed ‘objectively’–ie, by looking at the speech signal the way it shows up on an oscilloscope, in terms of its acoustic energy, amplitude, frequency, etc–the physical properties of speech do not segment in a manner consistent with how we hear it. That is, when listening to speech, we effortlessly hear phonemes sequentially constructing words, and hear empty spaces between words. But the acoustic signal of speech looks very little like this. For example, certain phonemes are produced virtually simultaneously (this is called coarticulation). Further, the articulation of a given phoneme (ie, with respect to its acoustic-energy properties) is heavily influenced by the nature of the phonemes that precede and follow it. Finally, there is often no demarcation–no ‘dead space’–between words in the acoustic signal.

This fact—that the acoustic signal of speech does not segment in a manner consistent with its semantic content—is the crux of the dilemma facing the authors of the present study. They want to ascertain whether the acoustics of speech follow the same scaling laws as written prose. But note that these statistical scaling laws are necessarily dependent upon how the signal is segmented–upon precisely what it is that one counts when adding up event frequencies. As mentioned, segmenting written prose is trivially easy. But for purposes of statistical analysis, how should speech be segmented? One obvious answer is to simply have a native speaker of the language listen to the speech and demarcate the acoustic signal into respective words. But note that this is simply reducing speech to written prose, and thus would not tell you anything you didn’t already know (as written prose has already been well-studied in this regard). Further, such listener-based segmentation is inescapably influenced by cognitive biases that result in hearing distinctions in the acoustic signal that, objectively speaking, simply aren’t there.

To get around this dilemma, the authors elected to approach speech analysis from a purely acoustic perspective. That is, they treated the acoustic signal of speech as simply a series of energy bursts, and analyzed those bursts without regard to how they (the bursts) were related to semantically-meaningful properties of the signal. In other words, they made no attempt to divide the signal into words, or even into phonemes; rather, they simply looked at the signal as a bunch of squiggly lines corresponding to bursts of energy of varying amounts, and they subsequently subdivided those bursts into discrete, arbitrary units, each of a different size (with respect to the amount of energy contained therein). These energy-units were then analyzed with respect to their frequencies of occurrence to see if Zipf’s, Heaps’ and the brevity law would apply.

And sure enough, they did. That is, the authors found that when speech was divided into arbitrary (but objectively real) units of energy, the rate of production of those units corresponded to all three of the scaling laws in question. The implications of this finding are well-presented in the Discussion section of the paper.

OK, I’m exhausted. If this megapost generates any interest, I’d be happy to share other thoughts I have. Either way, my thanks to @Aragorn for sharing such an interesting study with us.

4 Likes

I’m not sure I am following–would you mind expanding your post please?

Damn! You weren’t kidding about the wall! Not to worry though, your post was extraordinarily good…and exactly the kind of thing I was hoping from this thread! I need to reread everything, because I am not sure I have a good handle on it yet, but will be back on here late this evening.

And yes, absolutely interested in other thoughts of yours! Thanks for an excellent post.

1 Like

I am only saying that as deeper study comes on, there are times when accepted science is debunked by the new. Thus, while I hold it in regard, there are times where previous observations of viewpoints come back around.

Eggs good for you, eggs bad for you because of cholesterol, body produces cholesterol not food intake, eggs cheapest form of quality protein.

Aha, ok yes that makes sense. I agree–the beauty of science is that it is evolving and self correcting, the down side is that it doesn’t get things right at first, second, thirds, or 4th try many times lol. Even the good answers are evolving.

1 Like

OK, I’ll throw a topic into the science hopper.

All of us TNers are students of human movement–either functional (eg, sports-related) or fitness-related. One of the enduring puzzles facing scientists who study movement is, How is movement organized and controlled?

Consider something as simple as tapping your finger on a table in a comfortable, steady, metronome-like manner. Question: Why does your finger move in this rhythmic fashion? You might answer, That’s easy–because the relevant muscles are contracting and relaxing in a rhythmic fashion. OK, but why are the muscles contracting/relaxing in this rhythmic fashion? Well, you respond, that’s because the impulses from the peripheral nerves which control them are firing in this rhythmic fashion. Fine, but why are the peripheral nerves firing in rhythmic fashion? To which you answer, because their CNS ‘controlling neurons’ are firing in a rhythmic fashion. OK, but why are these CNS neurons firing in rhythmic fashion? And you answer, Because…because…

And at this point, we realize we’re stuck–we’ve dug ourselves into a hole with seemingly no way out. That is, we recognize that we haven’t really been explaining anything about the control and organization of rhythmic behavior at all. Rather, instead of explaining the origins of rhythmicity, what we’ve been doing is passing the buck–‘kicking the can’ up the motor-control pathway.

The rub is this: To meaningfully account for rhythmic movement, at some point you’ve got to pay the piper. That is, you have to find a way to explain how rhythmicity comes to pass, but do so without appeal to rhythmic input. In other words, you’ve got to explain how rhythmicity arises from non-rhythmicity–how a rhythmic output can emerge from a non-rhythmic input. In this regard, some movement scientists have looked to dynamical systems theory (aka chaos theory) for inspiration and ideas. One of those scientists is JAS Kelso. In 1984, he conducted an extremely simple experiment. He had people move their hands (at the wrist joint) up and down in a simple rhythmic fashion–as one hand went up, the other went down. He then had them slowly speed up the rate at which they conducted this movement. The experiment couldn’t be simpler, but the results couldn’t be more profound. Here’s the link:

https://scholar.google.com/scholar?q=kelso+1984+phase+transitions+and+critical+behavior+in+human+bimanual&hl=en&as_sdt=0&as_vis=1&oi=scholart&sa=X&ved=0ahUKEwjHu7-j-8zTAhVoyoMKHeUCCeYQgQMIIDAA

Once you open it, click on the PDF link to the right. As you scroll through the document, you’ll see that it’s entitled Speech Research. Don’t let that put you off. Go to document page 87 (it’s PDF page 94 for me), and you will find the first page of an article entitled Phase transitions and critical behavior in human bimanual coordination, by Kelso.

Once we get through this, I will follow-up with related research that will blow your freaking mind.

1 Like

Not sure if I quite get it, but that seems pretty damn cool so far.

2 Likes

posting from my phone because my computer apparently will not recognize my internet connection, but that is VERY cool. Kelso was an O.G. of movement research and badass.

I am currently writing programs for some athletes, and it is amazing how much we take for granted and still don’t understand about movement science in this manner. In fact I would wager that in some ways we know more about the process of motor learning than some seemingly simple aspecta of movement.

Can’t log into read the full Kelso paper until i get my laptop to cooperate since I don’t remember my login creds haha.

1 Like

Here’s something I do have access to on mobile. A couple favorite articles from early last year.

This is a fascinating peek into one of the possible mechanisms of non-genomic inheritance. This is the inter-generational inheritance of NON-chromosomal traits. In other words, inheriting cues about the environment or metabolism. Epigenetic signal inheritance, once thought impossible, and also one reason why ‘we are more than our gene sequence’.

The term epigenetics gets thrown around a lot, and is still somewhat of a catch-all term, but for any interested readers not closely acquainted with genetics the word “epigenetic” essentially means ‘environmental feedback onto gene regulation’. This does NOT mean a change in DNA sequence in most cases. Think of a dimmer switch for lights–the light bulb already exists, and you don’t change it out or add new circuitry. Instead you control how bright or dim that light bulb goes when it is turned on. Epigenetics is similar–what it means is that the expression or control of genes already present is modified by the external environment (or in the case of our gut microbiota feeding back into the brain or body, symbiotic organisms).

Imagine designing an environmental relay system to determine not only when to turn our hypothetical light on, but also how much light is emitted when it gets turned on, or even what KIND of light is turned on…say, when sunset hits outside, or when it’s rainy and cloudy.

Now, this was previously thought impossible in the context of reproduction. The exception of course would be prenatal development and mother’s nutrition (or drug use), which was known to interfere with or advance fetal development. However that is not epigenetics–the reason is that there is a direct action of the nutrients or drugs with the developing baby by crossing the placenta and entering the baby’s circulatory system (remember mom and baby share a common blood circulation during pregnancy).

Here, on the other hand, we see some of the very first work helping to determine a mechanism for inheriting non-DNA traits independent of direct contact with baby’s blood circulation…because the traits came from the father.

The authors in two separate studies show that mice fed a high fat diet pass metabolic abnormalities on to their offspring by sperm tRNA. Both teams fed the fathers in the experimental group a high fat diet. The Chinese team then injected the sperm directly into unfertilized eggs. The US team fed mice either a normal or low protein diet, then used in vitro fertilization of the sperm. The US team then compared RNAs from sperm in the testis to that of sperm further “down stream”. They found only downstream sperm contained anamolies. The offspring are fed a normal diet, and remain lean BUT they show abnormal glucose uptake and insulin insensitivity in the Chinese team’s work. In the US team’s work they showed elevated expression of genes associated with lipids and cholesterol.

This work has a bunch of incredible findings, but perhaps most incredible–not to mention disconcerting–is the knowledge that this may mean a “hangover” effect for the obesity epidemic: even if the obesity epidemic were solved tomorrow via drugs, subsequent offspring would still possibly be ingrained with the shitty metabolic regulation from their fathers, because their fathers ate a shit diet. The obesity epidemic has a potential vector for “contagion” transmission .

Links to full text articles first, the last link is to the executive summary which should be open access (I think)

Chinese team’s work:
http://science.sciencemag.org/content/351/6271/397.full

US team:
http://science.sciencemag.org/content/351/6271/391.full

Summary of both findings in plain english:
http://science.sciencemag.org/content/351/6268/13.full

3 Likes

BTW, with regard to the motor-control study I posted above: You can easily replicate it yourself. Simply put your hands palms-down on a table, and tap your hands (or just your index fingers) in an out-of-phase (ie, alternating) manner. Start off at a comfortable ‘could do this all day’ pace, then slowly speed up. Keep trying to go faster and faster, and take note of what happens to the phase-relation between your two hands/fingers after you pass a certain point in the speeding-up process.

OK, so I promised to respond to this, and after a really busy last week at work, here we are.

  1. The phrase “machine learning” has become a trendy catch-all term for many different forms of statistical modeling. Logistic regression, for example, has been around for several decades and is a relatively simple statistical model that we use all the time. Random forests, gradient boosting, and neural networks are more complex and (IMO) much more aligned with what I think of as “machine learning” algorithms.

  2. As someone who works in the trenches on some “big data” projects derived from electronic health records, at least fairly similar to the data that would have been used for this project, I’d like to throw in a probably-not-that-surprising caveat that these EHR-derived datasets usually have a ton of problems in comparison to pre-designed research studies. I’ve been given big EHR data dumps, run an entire massive analysis, shared the results with the PI, had them say “there’s no way that’s right” and we’ve had to deputize junior staff (residents or medical students) to hand-validate the records. They almost always come back with a TON of errors, like missing data or things that are outright incorrect. EHR datasets are typically accurate in gathering things like admission dates, discharge dates, procedure dates, and death dates (for people who died in the hospital; this is much more troubling for people that died outside the hospital). Lab data, functional testing, and diagnostic studies (echocardiograms, electrocardiograms, catheterizations, MRI, CT scans) are very hit or miss.

  3. With that said, it would still be extremely interesting to know that even flawed and messy EHR data offered meaningful predictive information. I am a little worried about potential for outcome misspecification (again, having worked with a lot of EHR and ICD-9/ICD-10 derived datasets, I could imagine plenty of people that had a CVD event without a recorded code, and vice versa) but for the time being, I’ll let that slide.

  4. The statistical methods get pretty nerdy, and I’m not sure how much nitty-gritty is worth delving into there, but I can make a few useful points about the methods and their implications without getting too technical.

  5. The first couple sentences of the Discussion say this:

“Compared to an established AHA/ACC risk prediction algorithm, we found all machine-learning algorithms tested were better at identifying individuals who will develop CVD and those that will not. Unlike established approaches to risk prediction, the machine-learning methods used were not limited to a small set of risk factors, and incorporated more pre-existing medical conditions.”

Yes, but…

The AHA/ACC risk prediction algorithm is intentionally simple, with few required inputs (just age, total cholesterol, HDL cholesterol, smoking, blood pressure, and diabetes), because it’s meant to be something that the family-practice doc can tally up quickly in their office with a few obvious variables. It leaves out a number of other factors that have evidence of association with CVD risk (several of which are mentioned in the ML methods) for just that reason; if you make any “risk calculator” too complicated, people will not use it (the medical literature is littered with would-be “risk calculators” that never penetrate routine clinical practice because they’re too complicated or time-consuming for most clinicians to bother).

It’s hardly a surprise that the ML methods picked up some more “stuff” that adds predictive value to CVD risk; I would guess that the people who made the AHA/ACC risk prediction algorithm are not surprised by most of the additional stuff the ML methods included, but knew that adding 10 more factors to a supposed-to-be-simple-for-daily-use risk calculator would have only modest improvement in performance that did not justify the increased complexity which would limit its utility in daily practice.

Incidentally, while the article paints a very rosy picture of ML, one of the first things I noticed was how little the ML methods actually improved risk prediction (the neural-network model, with the best results, had 67.5% sensitivity and 18.4% PPV; the ACC/AHA model had 62.7% sensitivity and 17.1% PPV). The incremental increase is “statistically significant” and would have profound impact if fully implemented on a population level; a 1-2% improvement in prediction in a system with hundreds of thousands of people is non-trivial, and as ED notes, translates to thousands of people being treated at a level more appropriate with their true CVD risk. However, those results may not be impressive enough to convince the average family-practice doc that they should abandon their clinical intuition and the simple checklist to assign patients to low, medium, or high CVD risk in favor of some fancypants ML algorithm that requires them to put in 28 different things and spits out an answer without a great explanation how it got there.

Of course, the big-picture idea is that one day ML-derived models would be programmed into every doctor’s office and hospital EHR so there’s no need for the doc’s seven-item checklist to quickly compute a CVD risk estimate and decide whether to put them on a statin. The patient would arrive for their appointment, the doc would perform the examination, and the computer would spit out the patient’s 10-year cardiovascular risk and a prescription for therapy. That’s where things may be headed one day, and of course papers like this are a necessary first step to say “Hey, ML-derived modeling is better than even a doctor’s brain.”

OK, tbh I’m pretty fried and probably didn’t say this very well, might come back and edit some later.

4 Likes

:clap::clap::clap::clap:

1 Like

I was actually thinking more of the cadence I used to cycle at. It was always virtually the same, the only thing that changed was the gear I was operating the bike in. There were certain “sweet spots” though, like a low-mid- and high cadence that felt best. There were also certain points in each, usually at the top end of each range at which my legs would basically “spaz out” and lose it, flying off the pedals and skinning the shins.

Is that approximately what this is referring to? If not I’d be happy to drop back to reading, because I’m pretty sure I’m out of my depth here.

I think there is a very good chance the experience you had relates closely to the subject of the paper.

But when you get a chance, do the ‘tapping’ experiment, and see if you notice 1) whether, as you go faster and faster, there’s a point at which you ‘spaz out;’ and 2) what happens to the phase relationship between your hands/fingers as you continue to speed up.

A bit of an adjunct to this comment, and one of the reasons I am somewhat skeptical that ML’s adoption in clinical practice is coming any time soon:

Physicians will sometimes ignore high-quality evidence that “X” has no benefit because it just makes intuitive sense that it would have an effect, and sp,e will continue to use/do “X” anyway even in the face of conclusive evidence that it has no benefit. Color me skeptical that those physicians will start deferring to the magic-box computer algorithm.

Personally, I think ML’s biggest near-term application is in what I will loosely refer to as “industry” to identify things like components that are likely to fail soon and warn that they need to be tested and/or replaced, and something like that may start to seep into clinical medicine: the doc won’t directly apply the newfangled risk calculator, but the magic-box algorithm will trigger a notification that this patient may benefit from XYZ, or if the doc orders a test, the magic-box says that it’s not needed because this patient has a low probability of condition ABC.

EDIT: I know this post comes off a little peevish towards physicians. No offense, @EyeDentist.

This is already happening.

2 Likes

I thought so, but it certainly still hasn’t achieved very widespread penetration.

That may change in the next 5-10 years, though.

1 Like

There is.

They begin to stutter and attempt become parallel!

I’m beginning to see an analogy to the quadrupeds gait analysis between walk, trot and sprint, like when a dog or cat goes from trot to sprint and their front and rear limbs begin to act together (both front/both back) versus their gait at a trot, which is (one front+opposite back)/(other front+other opposite back).

1 Like

That is exactly what the study found! As you increase frequency, the out-of-phase coupling becomes unstable, and the system suddenly and spontaneously re-organizes into a different (but also stable) coupling, that being in-phase (or ‘parallel,’ to use your perfectly apt word).

If you look at the top of page 89 (PDF page 96 for me), you’ll see three figures. Each represents a different way of portraying the experiment’s results–ie, the sudden phase change that Kelso’s participants experienced as they sped up the frequency of their movements. And just like you, you can see they experienced a brief period of ‘stutter,’ and then a sudden spontaneous shift to in-phase tapping. Even if some of the verbiage in the Figure description is unfamiliar, the sudden and dramatic shift from one stable phase to a completely different (but stable) phase is readily apparent just by eyeballing the figures. Isn’t this amazing?

2 Likes

I love science:

1 Like