Thread: winograd test and deep learning

  1. #1
    Registered User
    Join Date
    Apr 2017
    Posts
    80

    winograd test and deep learning

    (Maybe this Q should be moved to general discussions?)

    I've got a question which is about two things: deep learning and the Winograd schema.

    For anyone that doesn't know the Winograd schema is a suggested replacement for the Turing test, so it basically sets out to, if someone supplies some software/system and says, "this is intellegent", how do we actually know if that's the case? That's what these tests aim to answer. And the Winograd test does it by giving loads of these style of questions to the system:
    "The city councilmen refused the demonstrators a permit because they feared violence. Who feared violence?"
    So there's a situation description involving two entities, then a reference to one of those two entities using a pronoun (they, it, he, ... ) and then the question is, who/what is that pronoun referring to. In order to answer common and general sense is required, and an actual understanding of the situation, the interactions.

    So my question, I don't know that much about deep learning but from what I do, it seems to be mainly about giving the system the end goal(s), examples of, and then also access to the necessary stuff/context/information/system, and then the deep learning system goes to work to work out how to achieve the end goal within the given system/situation.

    Would deep learning systems be able to solve Winograd problems? Bearing in mind the textual descriptions in the Winograd questions can be about anything, so would potentially cover all human general knowledge of life and everything.

    If I'm basically right about how deep learning operates, clearly it's no good for creativity. It's not going to come up with creative new ideas. That's simply not how it works. You give it the end answer, it goes to work to work out a way of achieving that end answer. It might come up with a creative route to the specified goal, but it's not going to come up with new interesting end results. The Winograd answers aren't exactly in the categroy of creative new ideas, but on the other hand the end goals are as many as the questions. It seems to me the answers to the Winograd questions are somewhere between creative new answers and speficied end goals. They're neither, they're inbetween those two things.

    Discuss.

    Thanks.
    Last edited by BpB; 01-18-2018 at 04:15 AM.

  2. #2
    Registered User
    Join Date
    Apr 2017
    Posts
    80
    Further two things.

    By deep learning, I'm talking about the method of AI which is famous for learning and playing Atari video games by being given some screenshots of desirable end results. I think most AI that's up and running now, which is proving to be of any use, harks back to, is based on, deep learning?

    And just to actually state it clearly, my main question is: Could deep learning solve Winograd schema problems or not do you think?
    Last edited by BpB; 01-18-2018 at 05:35 AM.

  3. #3
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,791
    Why did the permits fear violence?

  4. #4
    Registered User
    Join Date
    Apr 2017
    Posts
    80
    Sorry, am I asking this in the wrong forum? I guess so. I'll try elsewhere. Thanks.
    Last edited by BpB; 01-19-2018 at 01:58 AM.

  5. #5
    Registered User
    Join Date
    Apr 2017
    Posts
    80
    For what it’s worth, maybe someone’s interested, from the deep learning Wilipedia page:

    Research psychologist Gary Marcus noted:

    "Realistically, deep learning is only part of the larger challenge of building intelligent machines. Such techniques lack ways of representing causal relationships (...) have no obvious ways of performing logical inferences, and they are also still a long way from integrating abstract knowledge, such as information about what objects are, what they are for, and how they are typically used. The most powerful A.I. systems, like Watson (...) use techniques like deep learning as just one element in a very complicated ensemble of techniques, ranging from the statistical technique of Bayesian inference to deductive reasoning."[163]
    Deep learning completely lacks any general knowledge and interactions of things, logic, which is exactly what is required to answer winograd questions. So no, deep learning alone wouldn’t have a chance of solving a Winograd problem it would seem.

    Do people not even know about the Winograd schema thing? There’s a competition in the next few weeks with a $25k prize at some AI convention The Winograd Schema Challenge at AAAI-18: Announcement | AAAI 2018 Conference . It’s been run before, couple of years ago I think, and the highest score, if memory serves, was something like 58%, which is pretty abysmal. 50% is rock bottom (choosing answers randomly, coin flip style, would get you about 50% because there’s only two possible answers, although I think some have three possible answers, per question). The human score (an average of many people answering many winograd Qs) was about 91% if memory serves, which is slightly surprising, I’d have expected something around 99%. Be interesting to see how well the best fairs this time.
    Last edited by BpB; 01-19-2018 at 03:28 AM.

  6. #6
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,791
    Does the Winograd schema test thing have an end answer that deep learning can work towards though? The "schema questions" to me don't seem to have anything to "latch onto" to learn (based on my limited exposure and understanding of them).

  7. #7
    Registered User
    Join Date
    Apr 2017
    Posts
    80
    No, that’s exactly it. The end answers are too varied, numerous and unknown. That’s why I said “The Winograd answers aren't exactly in the category of creative new ideas, but on the other hand the end goals are as many as the questions. It seems to me the answers to the Winograd questions are somewhere between creative new answers and specified end goals. They're neither, they're inbetween those two things.”

    Deep learning gets given an end goal (or more usually probably a bunch of separate end goals or answers), then it goes to work with loads of varying data which leads to, somehow, one of its end goals, and learns techniques to get to the given end answer/goals. Then it knows how to do it in the moment quickly. Eg speech recognition, voice to text. There’s a whole lot of auditory versions of people saying dog which leads to the word “dog”. You (or deep learning systems) learn how to identify the characteristics of sounds which are intended to lead to the word “dog”, so you’re flexible, high voice, low voice, distorted voice etc.

    Humans do solve winograd like problems all the time when they resolve pronouns, like my use of “they” about 10 words ago, it’s obvious without thinking I’m meaning “humans” not “winograd like problems” there. It’s one of those things we do effortlessly, without conscious thought, but it requires massive amounts of knowledge, pre existing in our heads, not just dictionary factual like knowledge, but also operations/functions/relations - *meaning* - and more nuanced aspects like cultural and social norms (see John’s birthday example below).

    To do it obviously we have an underlying logic, schema, relational general knowledge, so even though you’ve probably never come across the exact sentence “Humans do solve winograd like problems all the time when they resolve pronouns” ever before, you effortlessly resolve who/what “they” refers to, because you know resolving is a verb related to solve, and humans can do the operation solving, and problems are things which can be solved.

    In order to solve winograds reliably, you’re going to need general knowledge which covers whatever the textual description is about, and because you don’t know in advance what topics are going to be covered, you need a full spread, human like, of general knowledge, and that general knowledge has to include, critically I think, and this is why I reckon the winograd problems are good, how things interact, operations, functions, including the social norms behind those! Eg (this is a winograd I’ve just made up):

    It’s John’s birthday today and John wanted a new set of golf clubs. Peter has given John some new golf clubs for John’s birthday present. He is really happy.

    In that it’s obvious by he I mean John. But Peter wouldnt’ be logically, factually incorrect; giving is enjoyable. But it’s more likely I mean John. That’s the normal meaning of that sentance. If I wanted to convey Peter was happy, then I’d have had to made that clear in some way. So that’s, I don’t know, a kind of social norm. Blimey, good luck with writing code which learns and stores that across the board.

    That’s exactly why these winograd tests are possibly an excellent replacement for Turing tests, and are good tests of intelligence. Although only a base level of intelligence right? It doesn’t test human intelligence very well.


    To solve the problem, with a computer, across any/all topics, reliably, is massively hard. More I think about it the better the test is, and is an excellent replacement for the Turing test.

    I’ve just started doing a psychology course (which is part of the reason I’m asking about this BTW), in cognitive psychology we’ve just covered schemas, the psychology version.

    It’s a theory of memory and learning in cog psy. Piaget (the child development, metaphors guy) was into schemas, as are a bunch of others.


    I suppose, in theory, deep learning could be applied to winograd problems, the data set would be humungous: all the words a human has read and heard upto the age of say 20. Actually no, that wouldn’t even be the data set, that would be the end goals!; apart from the incorrect/illogical stuff you hear/read. It’s not the way to do it. Something like schemas, an underlying relational logic.

    So far as deep learning and winograd schemas go, it’s almost like the data set and end goals are one in the same.



    Interestingly, looking at the rules of the competition on the winograd page I linked to, to compete you turn up with a laptop, any commercially available one, which will run the application to solve the problems. And minimal internet access is allowed (I think that might be quite pivital, exactly how much internet access can you have and to what).

    Basically you’d have to store most standard general knowledge in some relational way (ie not too different to what the brain does) on a laptop - and of course work out a way of representing that and collecting it. I don’t know what kind of a system IBM’s Watson (mentioned in that quote about deep learning above) runs on, but I bet it wouldn’t run on a laptop. It appears Watson may be capable of solving winograd schemas. But it couldn’t win the competition because it wouldn’t run on a laptop.

    Watson won Jeopardy. I still don't really understand that game (I'm not from the US) but from what I do know, it suggests that Watson could solve winograd problems, maybe.

  8. #8
    misoturbutc Hodor's Avatar
    Join Date
    Nov 2013
    Posts
    1,791
    I think we're out of sync timezone-wise so I apologise for very delayed responses.

    After I got home today I did some reading on this subject and it's actually quite fascinating (mind-numbing as well). The requirement that any solution needs to run on a commercially available laptop ups the ante considerably even noting that you can, of course, use the GPU

    Watson employs a cluster of ninety IBM Power 750 servers, each of which uses a 3.5 GHz POWER7 eight-core processor, with four threads per core. In total, the system has 2,880 POWER7 processor threads and 16 terabytes of RAM
    Yeah, not quite a laptop

  9. #9
    Registered User
    Join Date
    Apr 2017
    Posts
    80
    > The requirement that any solution needs to run on a commercially available laptop ups the ante considerably even noting that you can, of course, use the GPU

    Indeed. The internet access, what’s allowed and what isn’t, seems really critical to me. I don’t think the processing power is the main issue, it’s storage. You could use exceptional amounts of processing power, and working memory, to set up your memory/processing network. You’d do the work upfront. But then how big is the end result of that work? Small enough to fit on a laptop? Also any questions which are on topics your upfront work doesn’t cover, then it’d be internet and processing carried out in the test time. Plus if you include that kind of contingency plan (for when questions are on topics you haven’t covered, and given the full spread of general knowledge it’s a given really isn’t it?) then you need the code for processing and generating the network, and that processing’s working storage (which I imagine would need to be big), irrespective of processing power, would require a lot of space. The way I see it is, once the characteristics, the particular salient aspect have been worked out, not too much space would be required for a particular concept, but to get to that point, a lot of space would be required, because of the workings as it were.

    You’d start with a dictionary including definitions. Use that as a basis. Then in addition some supply of ludicrous amounts of writings, I don’t know, I think such things exist, or just use the internet/Google; problem is with that there’s a lot of mess/rubbish/illogicalness. The processing would be same basic logic applied throughout, object oriented style, that is simple same rules applicable and applied at all levels throughout. Pattern finding, and patterns of patterns etc. Frequencies, relations, comparisons. You’d use the dictionary as the basis, and be looking to generate some sort of schema, not too disimilar to the one pictured above out of the dictionary, of the dictionary. But that’s a graph basically. Graphs aren’t enough. I don’t know the details of category theory, but from what I understand it’s like graph theory but suped up. In particular, there’s types of relations. In graph theory the only relation is a simple dumb link. Anyway, you’d do ........ loads of processing upfront, not using a laptop, to generate a kind of schema based on pattern finding and a base set of definitions -- you’re kind of looking to give a dictionary (and then some extra stuff probably) life kind of, or at least actual operational logic. The brain’s receptors are the way they are because of its experience. It’s an interplay between outside stimuli and current mental state (and biological state).

    An interesting question is, given that the winograd problems aren’t useful for measuring human intelligence, because unless there’s something wrong/ususual, or they’re very young, pretty much any human who understands language (and only unusual or ill humans don’t understand language) can solve winnograd schemas effortlessly. It only measures a basic level of competence of intelligence. So, what would a higher level of artificial intelligence be? And, at what point does “artificial” get dropped? At some point, that’s going to be clearly a stupid name.

    > Yeah, not quite a laptop

    Right, yup, that’s funny.

    Would be interesting to know if Watson can or at least could solve winnograds. The great thing about that deep learning, and the fact it learnt to play some Atari video games is that no one programmed it to play such, or any, games (unlike IBM’s chess computer which beat Kasparof). It learnt. I suppose there is an analogy between lots of different versions of audio data which all lead to the word “dog”, and lots of interactions which lead to ball in goal (or whatever winning the game is). Here’s the paper about deep learning learning to play Atari games: https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf

  10. #10
    Registered User
    Join Date
    Apr 2017
    Posts
    80
    Quote Originally Posted by BpB View Post
    So, what would a higher level of artificial intelligence be?
    Missed out the word test there, I meant:
    So, what would a higher level of artificial intelligence TEST be?

    Essay writing? That's the whole point of essays. I'm talking about the kind you do for university. You paraphrase, at a fairly high level (as in general meaning), not sentence level, in order to demonstrate understanding. But then judging an essay isn't so quick/easy.
    Last edited by BpB; 01-20-2018 at 11:26 AM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. deep fried pc
    By doubleanti in forum General Discussions
    Replies: 7
    Last Post: 07-02-2009, 06:36 AM
  2. In deep copy
    By Belzebuts in forum C Programming
    Replies: 1
    Last Post: 02-05-2006, 11:58 AM
  3. deep into C/C++
    By Da-Nuka in forum C++ Programming
    Replies: 6
    Last Post: 02-09-2005, 02:27 AM
  4. deep $%^#
    By newbie_grg in forum Linux Programming
    Replies: 0
    Last Post: 07-30-2002, 06:24 AM
  5. Diving into the deep end
    By Unregistered in forum Game Programming
    Replies: 5
    Last Post: 01-13-2002, 09:43 PM

Tags for this Thread