Mga app ng Google
Pangunahing menu

Post a Comment On: Steve Sailer: iSteve

"How does PISA really work? "Fix it in Post""

9 Comments -

1 – 9 of 9
Anonymous Anonymous said...

There was some talk of how finnish speaking finnish get better scores than swedish speaking finnish despite similar social backgrounds.
PISA test is literacy test even with maths, if you see the questions, so language plays a greater role than TIMSS where asians still score similarly as on PISA while finnish flounder.

12/4/13, 9:39 AM

Blogger panjoomby said...

still, item response theory lets you equate item difficulties - useful for anchoring - no need to give all the same items in all the countries if certain items act the same across countries. it's useful to find items that act differently in different language (say it's an easy item for average ability English speakers, but a difficult item for high ability Korean speakers - then it's a bad item in Korean, so you toss it out, but you still have enough items that act similarly to be able to equate the test across countries! altho, an item may be difficult in any language b/c it's truly difficult - OR it might simply be a bad item in any language. sorry, IRT is probably a comment thread ender!

12/4/13, 10:58 AM

Anonymous David said...

A shot in the arm might be a question about "The Fast and the Furious."

12/4/13, 11:04 AM

Blogger Steve Sailer said...

panjoomby, thanks for clueing us in on Item Response Theory. Am I right in saying a big advance over Classical Test Theory is that you can now fix a lot of stuff after giving the test?

But, is there a danger of over-fixing the results? Are there standards for what not to do in post?

12/4/13, 11:07 AM

Anonymous Anonymous said...

What's interesting is the interest in the PISA by US elites even though the PISA seems sort of like a relic, an anachronism from the heyday of the industrial nation-state, which is passe among Western/globalist elites. The industrial nation-state was all about meeting basic needs like education for the majority and raising the majority to basic standards of competence so they could be competent citizens for industrial enterprises, the national government, military, administrative apparatus, etc. The PISA measures how competent and organized a country is at supplying basic education to its majority.

12/4/13, 11:15 AM

Anonymous jody said...

observed problem difficulty between groups is one of the ways that test validity is established in research psychology. if a problem is difficult for all groups, regardless of how often each particular group tends to get it right, then the question is considered hard for everybody. if the problem is difficult only for one particular group, then it is probably not a useful problem for testing purposes and will likely be discarded.

add up a bunch of problems which all behave this way - every group seems to encounter the same ramp up in degree of difficulty from question to question - and you have a pretty good test battery. problem 1 is easy for everybody, problem 2 is harder for everybody, problem 3 is hardest for everybody, and so forth.

the difficulty of each problmem can be decreased to create a hurdle that almost anybody of any group can clear, which would be how the civil service exams have been written, or increased to a wall so high that only a few people of a few groups can climb it.

strictly from a testing perspective, what is not relevant are problems which almost nobody from some groups can solve. as long as everybody from every group still has a lot of trouble solving the problem, this means the problem is a still good one, and is useful for raising the ceiling to which the test can measure.

objections to those kinds of problem sets are generally raised from people outside the field of psychology, not from psychologists themselves.

12/4/13, 11:41 AM

Blogger Power Child said...

"We'll fix it in post" are known to production guys as the five most expensive words in filmmaking. Having worked in post, I can say they're still very expensive but getting less so.

12/4/13, 1:05 PM

Blogger panjoomby said...

yep, the problem is similar to deciding when an individual data point "qualifies" as an outlier -- how outlying does it have to be till you decide it's screwed up - & how much "behind the scenes" info do you have to explain why it's so - fixing things in post is subjective - IRT lets you accurately predict what % of the population will get the next item right, say at such & such an ability level a person has a 99% likelihood of getting the next item right, etc. So when an item doesn't follow the prediction, you can't trust the item. but deciding how bad is bad/how far off the mark does it have to be before you yank it - well, that's subjective empiricism at its best!

12/4/13, 5:52 PM

Anonymous Anonymous said...

Wow, comparing IQ tests to filming a spectacular action sequence with explosions and airheads actors reading banter from a script. I take back everything I ever said about "psychometrics" lacking empirical rigor. Thanks, Steve.

12/5/13, 12:19 AM

Comments are moderated, at whim.
You can use some HTML tags, such as <b>, <i>, <a>

Comment moderation has been enabled. All comments must be approved by the blog author.

You will be asked to sign in after submitting your comment.
OpenID LiveJournal WordPress TypePad AOL