My Project

Oct 29 2011 Published by under Uncategorized

I am hesitant to discuss much of this as I normally don't talk about my work.  There are good reasons for that.

You see... I'm the enemy.  I am a content specialist for a major producer of standardized tests.  My specialty is (duh) science.

Now, I've been in this industry for 3 years now and let me tell you, it's not what you think it is.  Unless you are in the industry or actively involved as a client, you can't imagine what it's like.  I can share a few things with you.

It takes over 18 months for a question to go from an idea in someone's head to an operational item (that means it's scored and the score counts for whatever the test is for).  Each question, depending on the project requirements, will be seen by 2-3 content specialists (usually each one more than twice), artists, copy editors, fact checkers, clients, client committees, and a bias/sensitivity expert before ever even seeing a test form.  Then there is field testing, data review, final review by the client... then it MIGHT get on a test.

The first thing most people think of when they hear 'standardized testing' is the recently ended No Child Left Behind and maybe Obama's Race to the Top programs.  I will say that it is my opinion that these standardized tests in these contexts are used for entirely incorrect purposes and at incorrect times.  But those are client decisions and "him what pays, says".  But there are a lot of tests that have to be standardized that you might not think about.  Every industry that has some kind of certification exam has standardized tests... nurses, IT techs, aircraft mechanics, etc. etc.  Those are generally used properly.

When I say properly, let me explain.  What is the purpose of a test?  To see if the tester knows something.  Now, a well designed test question will not only tell you if the student knows the information, but can also tell you why the student got it wrong.  That last bit is critically important and why much of the high school testing... isn't properly used.  There's accountability with no chance at improvement. If the tester doesn't learn, then there' s very little point in doing it... if you don't learn, there's very little point in doing anything.

A properly designed test should have a diagnostic component.  Which is a pre-test.  What does the tester know now?  It can identify areas of improvement and even (sometimes) over clues into the misconceptions the tester has so they may be taught correctly.  Any assessment is a tool that students, teachers, parents, state officials can use to see what's going on with education at their level.  Unfortunately, it's not being used this way (mainly because it is expensive).  Again, there's a big difference between public school assessments (which are free for the students to take) and professional certification tests that are not free.

But why a standardized test?  Well, that just means that over a given period or group, all the testers take the same test.  Their are several reasons for this.  One is so that scores can be compared between students, schools, classrooms, socioeconomic groups, gender, ethnicity, ect.  And yes, we do compare every test question in every single one of these ways to check for issues.  Because tests are standardized, they can even be compared year to year.  Usually a group of questions are carried over from one year to the next and these form the basis of some extensive statistical analyses to determine how students compare year over year.  It is truly staggering the amount of information that is developed from these tests.

It can go even further.  A few of you may remember Obama's "Sputnick Moment".  Well that's from another standardized test (PISA) that is given to students all over the world.  The same questions given to students in 70+ countries.  The US didn't do so well in the latest one, hence the "Sputnick Moment".

Another complaint that people often have about standardized testing is that it is too easy to guess.  99% of the time, the questions are 4 option multiple choice.  Well, that is changing.  A number of industry leading companies have a variety of new products out.  Items that are hot spots, where a tester selects one or more portions of an image and the computer tabulates the location of each click to determine a score.  Drag and drop, which is a glorified matching question, but often with some advanced features.  There is even some significant research into computer scored essay questions.  I've seen a demo and it is absolutely stunning.  It is not a word count type of system.  It is a learned relational database.  It can tell the difference between a BS answer with lots of technical terms and one that has the exact same terms, but correct.  I've seen it.  It is truly amazing tech.

Sorry for the digression, but I hope that this has given you some insight into the industry.  Like any industry, there is a lot of proprietary technology, processes, clients, etc.  I can't get into that.  If you have any questions, then I'll try to answer them if I can... the more general the better.

But standardized testing is here to stay.

Now, on to my project that I am epically excited out.  This is really a pinnacle of the career type of thing.  I am responsible for the development of the science standards for a MAJOR client.  This isn't state wide or even national.  We are likely to go multi-national with it.  Now, I'm not doing this by myself.  There is the client, various advisory committees, consultants, consultant groups, and a host of businesses all involved.  But I'm the guy that is actually putting the words on paper.  Which means, a lot of what I say will be incorporated into the science standards.  I've made a number of changes and recommendations and the client seems to pleased.

My trip to New York, next week, will be the first of a series of committee reviews of these standards.

When I think about, which I try not to do, I am excited that I am working on such a major project.  Then I get seriously nervous.  What if I say something wrong, what if I didn't push hard enough to get something vitally important in or get something that ends up a waste of time out?

I actually started another draft document today and that's why I'm thinking about this now.  We're talking about being an influence (however small it might be) on literally hundreds of thousands of students a year.

I can say for certain that evolution will be a major theme.  The client unambiguously agrees with me and a consultation group that I assembled from experts in science education.  We're not going to beat around the bush either.  Common descent, speciation, selection, etc.  will all be fair game.  I am very happy about that.  Even if students don't believe it, they still have to learn it and they will learn what evolution is really about instead of the misinformation that is promoted almost everywhere in the US.

I'm just babbling now... and as usual... I'm not sure where to stop.  So, if I can answer any questions you might have, let me know.

13 responses so far

  • I relatively recently took the GRE, and I am really interested in how the computer-based essay-scoring works. Do you have any insight into that (that you are allowed to share)?

    Project sounds cool! Big up for evolution.

  • I'm not sure if the GRE does a computer scored essay or not. If you aren't familiar with the scoring process, it goes something like this.

    Writers create writing prompts. These are what you see. "Describe a trip to a museum." or something like that. Then a group within the testing company examines the prompt to make sure that it can be scored in a reasonable fashion.

    By that I mean, it has to scored quickly and accurately, with an easily described rubric. Basically, this group will hire some 300-750 scorers right after the test to do the grading. If the summer is available, then they will pick up a bunch of teachers, but that's not normally the case. You get a very random mix of people. So the scoring process has to be very carefully constructed to reduce the uncertainty as much as possible.

    For the museum prompt, you might say that the tester must mention the name of the museum, what it is a museum of, how long they spent, and 3-4 things that they saw at the museum. That's the rubric. If they did all that, then they get the max possible score (usually a 4). If the only described 1 thing that they saw, then it might be a 3. If they didn't tell how long they spent and where the museum was (as well as only 1 thing), then it's a 2. If they wrote about the time they took their dog to the vet, then the score is a 1. (And yes, people do that.)

    Note that this has nothing to do with grammar and spelling. That's a whole 'nother ball game and may or may not be a part of the score.

    So a scorer reads the essay compares the results to the rubric and assigns a score. Hopefully this takes less than 5 minutes. A large state might have 250,000 essays to be graded in a week or less. The test then goes to a second scorer which does the same thing. If the scorers assign the same score, then that's your score. If they disagree, then it goes to a professional scorer to evaluate.

    Now, with an automated scoring system, that museum question is too generic. There is too much leeway and the system won't be able to sort it out. That one person who spent three days at Ken Ham's creationism museum will just be too far away from the others to be understandable.

    What you need for automated scoring is a tighter prompt. Something like "explain the process of photosynthesis". There really is only one correct answer here. There's some leeway in minor details, but all of the answers should be very, very similar. This would be an ideal candidate for automated scoring.

    Human scorers do the first 500-1000 essays. These essays and the assigned scores are loaded into the relational database. Now this is not an evolving database. If you send one essay in first and send it again last, it will get exactly the same score.

    The other thing about these scoring systems is that the longer the answers, the better the scoring results will be. You could probably describe photosynthesis in one sentence, but that would be very variable. Instead, with more words and sentences, it is easier for the system to 'understand' what is being said.

    As far as the technical details... no, I don't know.

    Here's a white paper given by the same group that I saw the demo for: http://www.pearsonassessments.com/hai/images/tmrs/PearsonsAutomatedScoringofWritingSpeakingandMathematics.pdf

    Here's a list of papers by ETS: http://www.ets.org/research/topics/as_nlp/bibliography

    I will say this, by all indications, this is the wave of future. It's cheaper, faster, and more accurate. I suspect that all essays will be graded with tech like this in the near future.

  • Joe G says:

    Could you please elaborate on this alleged "misinformation that is promoted almost everywhere in the US."

    Or was that just babble too?

  • Yes Joe, misinformation about evolution like some of the things YOU have said like:

    Evolution is totally random/chance
    there are no transitional fossils
    evolution denies God (or a designer)
    Evolution is just a theory
    x can't evolve because its too complex
    etc. etc. etc.

    But this thread isn't about evolution. In fact, not a single post I have made here is about evolution, I would appreciate you staying on topic.

    Thanks

  • Joe G says:

    I have never said that "x can't evolve because its too complex"

    I have never said that evolution is totally random/ chance. The way the modern theory of evolution is posited the mutations are entirely by chance- not directed, no purpose- and "selection" is a result- no one can predict what mutation will occur and no one can predict what will be selected for (Dennett). So what do you think is non-random about it?

    With intelligent design evolution- well yeah, that is mostly non-random.

    Transitional fossils? I say it is a sliding definition

    As for evolution and God:

    "In other words, religion is compatible with modern evolutionary biology (and indeed all of modern science) if the religion is effectively indistinguishable from atheism."

    "The frequently made assertion that modern biology and the assumptions of the Judaeo-Christian tradition are fully compatible is false"

    "Evolution is the greatest engine of atheism ever invented.

    Naturalistic evolution has clear consequences that Charles Darwin understood perfectly. 1) No gods worth having exist; 2) no life after death exists; 3) no ultimate foundation for ethics exists; 4) no ultimate meaning in life exists; and 5) human free will is nonexistent"

    "As the creationists claim, belief in modern evolution makes atheists of people. One can have a religious view that is compatible with evolution only if the religious view is indistinguishable from atheism"

    ‘Let me summarize my views on what modern evolutionary biology tells us loud and clear … There are no gods, no purposes, no goal-directed forces of any kind. There is no life after death. When I die, I am absolutely certain that I am going to be dead. That’s the end for me. There is no ultimate foundation for ethics, no ultimate meaning to life, and no free will for humans, either."

    All of that was from Will Provine- he is a professor at Cornell University

    And you did mention evolution- but obviously you were just babbling.

  • Joe, read what I said carefully. I'll let this comment remain, but this is not the place for you. This is a blog about science.

    It is not a place for your strawmen, your argument from authority, and your arguments from ignorance.

    If anyone is interested in hearing what Joe has to say, he has a thread on the Panda's Thumb forum here: http://www.antievolution.org/cgi-bin/ikonboard/ikonboard.cgi?s=4eadf8af356dc315;act=ST;f=14;t=6647

    The comments in that thread are not polite and not for the faint of heart.

    Further commentary from Joe will be moderated.

  • sallyreece says:

    Wow, I just browsed Joe's thread that you linked, what a horrid young man he must be. I don't like coming down on teenagers as we all were a little daft at that age. Still I hope he reconsiders his behaviour before posting again.

  • Sadly, he's in his late 40s - early 50s.

  • Joe G says:

    Geez Kevin- You don't have to lie- What starwman have I ever presented? What about your claim of my ignorance?

    All you have are arguments from made-up authority.

    Good job.

  • Joe G says:

    Hi Sally,

    Strange that you come down on me and not the disgusting posters on that blog Kevin linked to.

    I take it that means you condone the actions of liars and bullies.

    Good for you....

  • sallyreece says:

    Are you sure about that Kevin? If it's true then I don't know what to say. There must be something else going on for him to behave in such a fashion.

    Hello, Joseph. I'm afraid I shan't respond to your points unless you find it within yourself to apologise for your behaviour thus far. It's no use trying to point fingers at others, we must take responsibility for our own actions. If it is true about your real age then you should already know this.

    Good afternoon.

  • Sally, yep as sure as I can be.

  • kgiFozzqmq says:

    Wholesale Cheap 1:1 replica louis vuitton Handbags / Bags / Purses from china Online Outlet for Sale
    http://mooreksvc3.webs.com/ tporrm

    Wholesale Cheap 1:1 replica louis vuitton Handbags / Bags / Purses from china Online Outlet for Sale
    xmsqqx