Learning Thoughtfully

Sunday, August 27, 2023

Analogy, Association and AI

A few months ago, a very interesting paper came out that tested the GPT-3's performance on tests of analogical reasoning. (There is an accessible article about it here.) Surprisingly for a system built entirely on correlation, ChatGPT performed quite well, outscoring the average human test-taker.

The study used both abstract (letter and number-based) and verbal analogies. While I can't comment on what might be going on with the abstract problems, an experience of mine may shed some light on the AI's verbal performance. I scored 800 on both the verbal portion of the SAT in 2000 and the verbal GRE in 2005, when both tests included analogies. For the record, my prep both times consisted of a Kaplan CD-ROM, which I used mainly for math, and the official practice test.

Taking the tests was an interesting experience. The easy and medium questions worked pretty much as intended. I recalled the meanings of the words, figured out the relationship between the words in the original pair, and marked the answer choice that would give a similar relationship in the target pair. (This is where explicit vocabulary training can help student performance.)

But after a certain level of difficulty, that strategy failed. A number of the harder questions contained at least one word I could not have explicitly defined. On those questions, I went by feel. Many of the words I couldn't define were still words I had encountered in reading. I had some sense of how they were used and how they related to other words, so I went by that, throwing in some other cues like etymology. Apparently, it worked. (At that level, the SAT and GRE pretty much becomes tests of how much you read.)

I relate this experience because association is fundamentally what GPT does. Although its detailed workings are a secret, we do know that this kind of AI uses linguistic correlation structures derived from enormous amounts of text. And somehow, that kind of correlational knowledge can enable a test-taker to solve analogy problems without relying on explicit deductive reasoning. It may be worth investigating whether generative AI is doing something similar and how far such a strategy can take it.

Wednesday, April 10, 2019

Growth Mindset, Plausibility and Statistics

Sometimes a paper comes along whose main finding is totally plausible and, if true, would have important practical applications. You want to believe this result. And then you read the actual paper and everything falls apart.

The recent Canning et al. publication on growth mindset and racial achievement gaps is such a paper.

The authors examine the relationship between instructor beliefs about the malleability of intelligence and the achievement gap between underrepresented (URM) and non-URM students. After controlling for many factors, they find this gap is smaller if the instructor believes intelligence to be changeable, i.e. has a growth mindset. This is an intuitively reasonable result. Unfortunately, the paper gives us no reasons beyond intuitive reasonableness to believe it.

Here are the main problems with the paper.

Is there a there there? According to Canning et al., instructors who scored one standard deviation above the mean on a measure of growth mindset had a URM-nonURM achievement gap of 0.1 grade points. Those who scored one SD below the mean had an achievement gap of 0.19 grade points. So increasing instructor growth mindset by two standard deviations decreases the achievement gap by all of 0.09 grade points. The paper summarizes this result by saying, "the racial achievement gap was nearly twice as large in courses taught by college professors who endorsed fixed (versus growth) mindset beliefs about students’ ability." That sounds more impressive than a change of 0.09 grade points.
For anyone teaching introductory statistics, this is an excellent illustration of the difference between statistical and practical significance, as well as relative and absolute change.
No data is shown. At first glance, Fig. 1 in Canning et al. looks like a typical dynamite plunger plot showing the mean and some sort of error bar. This is a poor way to present data because it completely hides the distribution and can make totally different datasets look the same. A histogram or at least a box plot would be much more informative.
But it gets worse. A closer look at the caption of Fig. 1 reveals that it doesn't show any data at all. Rather, it displays predicted values from a complex statistical model incorporating numerous student-, instructor- and course-level variables. In fact, no figure in the paper displays any actual data. Only the outputs of statistical models are shown. Even the results discussed previously are modeled, not actual results.
Now, statistically adjusting for potential confounders is often an appropriate and useful thing to do. I am not against the practice. However, a publication should start with the data itself and then discuss any necessary adjustments. Otherwise, readers are essentially asked to take authors' analyses on faith.
More predictors are not better. Canning et al. predicted student grades using a statistical model that included the following variables:
- faculty mindset (the actual variable being studied)
- student gender
- student race/ethnicity
- student first-generation status
- student SAT scores
- course enrollment
- course level
- faculty gender
- faculty race/ethnicity
- faculty age
- faculty years of teaching experience
- faculty tenure status
The problem with this level of thoroughness is that many of these variables are clearly not independent. Faculty age, teaching experience and tenure status are obviously highly correlated. Students from certain backgrounds may have lower SAT scores than others. Upper-division courses are usually smaller than lower-division ones.
All this matters because the use of correlated variables in a regression analysis, termed multicollinearity, can result in parameter estimates that are very sensitive to changes in the data or the choice of predictors. Essentially, the regression coefficients become uninterpretable. And since mindset is merely one of the 13 predictor variables in the model, its regression coefficient is just as affected as the others. Whether the predicted grade difference resulting from growth mindset is affected or not is not entirely clear. If predicted grades were obtained by simply plugging values into the model, they are not affected by multicollinearity. However, all results derived from model coefficients are affected.
None of the problems outlined here are unique to this study. There is a desperate need for better statistical analysis and data presentation in the education literature. We need to focus not on ever more sophisticated statistical techniques but on a solid understanding and use of the basics. Show the data. Use absolute change. Don't use techniques without understanding the assumptions behind them. These measures alone could prevent many promising but unreliable publications.

Sunday, August 5, 2018

A New Class

August is here and I am about to start teaching a new class, the same Math for Life Scientists that I have taught for the last four summers. This year, I want to deliberately implement thoughtful learning. Here are my plans, pristine before their collision with reality.

I'm going to pause more. It turns out that the "what just happened?" processing time I started using last fall based on intuition is actually a technique called the "pause procedure" with a fairly solid research base dating back to the 1970s. (See, for example, this and this.) In the original studies, students were just asked to compare their notes with a neighbor's, although some instructors assign more complex tasks. Since time is an absolute requirement for thoughtful learning, the pause procedure fits this approach perfectly. I plan to mostly use pauses to have students compare notes or discuss confusing points, as in the original studies, and explicitly ask them to come up with questions.
I'm going to avoid multiple choice questions and the use of peer discussion as a fallback when no one answers a question. These strategies resulted in the least participatory classes I ever taught and had no discernible effect on student learning. There's (probably) nothing wrong with peer discussion, but only if it's planned from the start. Otherwise, it rewards nonparticipation. If students are truly stumped by one of my questions, I can scaffold.
I'm going to do even more retrieval practice, especially asking students to write summaries at the end of class. If it's not too much trouble, I might try the three-step "brain, notes, other students" color-coded approach, but that might be too much to fit into an already packed summer schedule.
Finally, I'm going to try to give students more interesting conceptual questions to think about during class and breaks. This means explicitly asking them to not read the book before it's assigned. That shouldn't be a hard sell.

We'll see how this goes.

Wednesday, May 23, 2018

May 2018 Link Roundup

A few items I've come across that merit being pointed out.

While I'm somewhat biased against multiple choice questions, RetrievalPractice.org has a discussion of how they can actually be good learning tools. The key is that the incorrect alternatives must be plausible enough to make the student retrieve information about each one.
If you really want to supercharge multiple-choice questions, try this idea from The Effortful Educator. Students comment on why someone might choose an incorrect answer, how the question might be modified to make an incorrect answer correct, and more. If you use clicker questions, this activity might make good homework.
A promising type of math practice -- same surface, different deep structure problems.
Finally, Diana Senechal as usual embodies thoughtful learning, this time in a post about praise in American and Hungarian schools.

What kind of praise is appropriate in the classroom? Those of the “growth mindset” persuasion often say that teachers should praise students for effort, not for ability or accomplishment. That strikes me as too rigid; different situations call for different kinds of praise. Sometimes students do need to hear that they have a particular ability or that their work stands out. What matters is that the teacher praise and criticize thoughtfully, not automatically, and that she avoid using praise (or criticism) as a way of exerting control. When students depend too much on teachers’ praise or take it too much to heart, they lose their own critical sense. A teacher’s praise should help students find their way.

I'm on a reading binge about motivation and curiosity, so expect posts about that soon.

Tuesday, May 15, 2018

Training for Flexible Teaching

For about the last year, I've been working with a speech and voice coach and recently started taking an acting class she teaches at UCLA Extension. Halfway into the course, I am most struck not by any particular technique or exercise but by how many ways there are to do one thing.

Several weeks ago, we started learning short monologues. We then explored each in three different ways ("body NRGs" in the jargon of the method we are using). Sometimes, the teacher asks students performing the monologues to do them in a different way, saying, "What if your director asks for something completely different?" Many of these experiments produce results that are unexpected but make sense. Even the ones that don't often reveal something about a piece that wasn't apparent otherwise. The point is to explore many possibilities before settling on one and be able to respond creatively to whatever happens.

In math and science teaching, at least in higher education, we often seem to look for the One Best Way to teach a topic. But life interferes. Sometimes, you write out careful notes and a student asks an insightful question, which leads to a twenty-minute discussion. Sometimes, your planned ten-minute review becomes the whole lesson because that's what the students turn out to need. Sometimes, you approach a student to help them and find that they are too frustrated or upset to focus. Being a good teacher means being able to respond to these circumstances in the moment, reacting flexibly while still accomplishing what you need to accomplish. This means that teacher preparation at every level, from formal coursework to writing out your notes the night before a class, needs to focus on developing flexibility. The point of preparation is not to know exactly what you will do but to be able to respond to whatever comes up.

Since I'm not teaching this quarter, I'm trying to implement a flexibility-building type of preparation with the undergraduate learning assistants I supervise. Right now, I'm just trying to ask them for multiple possible problems and solutions that might come up as they help students. In the future, I may develop more methods, but training for flexibility looks like a good concept.

Sunday, April 15, 2018

Time to Think

What do the following two scenarios have in common?
1. A professor gives a dense, fast-paced lecture with lots of slides. Students scribble down notes, trying to keep up. They need to get all the key information down before class ends.
2. In a flipped classroom, students go from one clicker question to the next. They talk about each question with a partner, and once all the answers are in and the instructor has expanded on them, go on to the next problem. No question takes more than a few minutes to get through.

These scenarios are taken from styles of teaching that are typically held up as polar opposites, yet I would argue that they are more similar than different. In particular, they fail the same way. In neither classroom is deep thought occurring. And it is not occurring for the same reason -- lack of time.

Ben Orlin has a typically charming post on barriers to deep thinking in school. However, I think he missed one. Students do not think deeply in school because there is no time for them to do so.

The primary requirement for thoughtful learning is time because the primary requirement for thought is time -- whether for private contemplation or for a conversation to proceed beyond the obvious. I have been to too many teaching workshops where participants were given a question to discuss in groups and just as the discussion was getting good, just as learning was starting to occur, we were interrupted and had to go on to the next question. Using fewer questions might have worked better.

The humanist and educator Diana Senechal wrote about similar experiences during her teacher training in her book Republic of Noise:

Just as I started to ponder a topic, I had to move into my group and start working and talking. The work seemed superficial and rushed. It seemed, moreover, that the groups reached predictable conclusions about what they read or did. The instructor would move from group to group, listening to each discussion for a few minutes. When, at the end of class, she pulled together the insights of the day, it seemed that many of the finer points had vanished.

In order for our students to have a chance to think, we must slow down. If lecturing, remove some material that students can read on their own and give them time to process. In a math class I teach, I've experimented with pausing after a long derivation and giving the students a minute or two to think through what just happened in whatever way they need, whether doodling on paper, discussing or staring off into space. I plan to try explicitly providing time for students to come up with questions to ask after covering a topic.

Another potential strategy to give students more time to think, mentioned on Susan Cain's Quiet Revolution blog, is to ask them a question at the end of a lesson that will be discussed next time. Ideally, the question should be one that deserves the time and solitude this approach provides. This is quite similar to inquiry-based learning in math and may be one of the reasons I like that approach.

Let's take the time to think of ways to give our students time to think.

Saturday, March 24, 2018

Two Ideas for Promoting Transfer

Often, one of the hardest things for students to do is apply a newly learned skill in a new setting, even one that looks almost the same to the teacher. The technical term for this is transfer and promoting transfer is one of the most difficult tasks in education.

The key to transferring knowledge from one situation to another is noticing that, despite superficial differences, the two situations are somehow the same on a deeper level -- in technical terms, they share the same deep structure. For example, an arms race and the ice-albedo feedback loop that enhances warming at the poles are both situations where a change in some quantity (the amount of weapons owned by country A, the amount of ice at the north pole) leads to a further change in the same direction as the initial one. Both are positive feedback loops. Transfer would involve a student who learned about positive feedback loops in the context of arms races applying their understanding to climate change, or vice versa.

Two recent papers have described promising results in promoting transfer. One is an elaboration of methods that many teachers already use, while the other is fairly new (although we sometimes use a very similar one in LS 30 and related courses).

The first paper describes something called concreteness fading. It's exactly what it sounds like -- starting with a concrete example of a topic and then gradually moving toward a fully abstract one. In this particular study, the researchers taught second- and third-graders about equivalence problems of the type 2+5+3 = 2 + __. The teaching was done either through concrete examples (sharing stickers and making balances balance), paper-and-pencil math problems, or a concreteness fading condition that started with stickers and balances, then moved to paper representations of these things, and then moved to actual problems with numbers.


From http://www.learningscientists.org/blog/2018/2/1-1

After the initial learning stage, the kids were presented with problems, including word problems, more complex than anything they had been taught. This was the transfer stage. The kids who were taught entirely using concrete methods performed worst, followed by those taught abstractly. The ones taught using concreteness fading did best.

What happened? Students who only see concrete examples have a hard time generalizing. They may not see the deep structure of what they are doing. (Using a variety of examples may mitigate this but is not always practical.) Abstract learning is general but often difficult. Concreteness fading may bridge the gap between the two, making the abstract learning more effective.

The other study gave undergraduates a classic problem that was analogous to a story they had read. Most people find the analogy difficult to see unless told to look for it. However, their performance improved substantially (from a 10% success rate to a 25% one) if they were asked to come up with a problem analogous to the one they were trying to solve before actually solving it.

This is a very practical result. In some cases, it may be enough to ask students to come up with examples of a new concept, which I already do (there are a number of such problems in Modeling Life) and could do more of. For more complex problems, perhaps including programming problems, asking students to come up with a problem analogous to what they are trying to solve could make sense. At the very least, it's worth a try.