Syntax Highlighting for English?

**KONI** · 05-31-2007

Let's throw a bit of theory into this, shall we:

Computational processing of textual data

Interesting parts:

Part-of-Speech tagging (Brill, HMM)

Part-of-Speech tagging (HMM ctnd.)

Then I recommend "Parsing, formal grammars" and "Stochastic Parsing", as well as "Classification - Visualization". As long as we don't have the same theoretical background, there's no use discussing anything. Personally, I took computational processing of textual data and "natural language processing" (audio) and some opinions/beliefs in this thread amuse me.

**Govtcheez** · 05-31-2007

Originally Posted by KONI

some opinions/beliefs in this thread amuse me.

How's about you and brewbuck drop a little science and enlighten us? That's certainly more productive than saying "I'm right hahaha"

**MacGyver** · 05-31-2007

This idea is interesting.

**Sang-drax** · 05-31-2007

brewbuck, if you encountered a language you've never seen before, for example Hindi, then I am sure it would be impossible for you to figure out which words that are nouns. You would probably not even recognice the characters used.

It is impossible for you to determine where the nouns are -- the information just isn't there.

But! After syntax highlighting using a human or an extensive database of words and grammar it would be very easy for you to see the nouns -- they would be bold. There's your extra information.

The problem here is that you don't seem to acknowledge the external database/human experience as additional information. I don't really understand why, but obviously you must have some deep insight in information theory that I don't. I skipped that course. Feel free to enlighten me with arguments other than what subjects I don't understand.

**Desolation** · 05-31-2007

I don't get how that could ever be helpful... When I'm reading, I don't give a damn if a word is a verb or a noun, I just naturally know it. English isn't even my native language... Awful idea..

**vart** · 05-31-2007

Originally Posted by Desolation

I don't get how that could ever be helpful... When I'm reading, I don't give a damn if a word is a verb or a noun, I just naturally know it. English isn't even my native language... Awful idea..

Sometimes to undersand a sentence correctly yo HAVE to know if the word is noun or verb - it can greatly change the meaning of the sentence... Not all people can do it easely with the foreign language... especially if it is english that does not bother to distinguish verbs and nouns with some additional parts of the word...

**KONI** · 06-01-2007

You're completely forgetting the fact that natural language is by definition ambiguous.
Sometimes writers don't want you to know if something is a verb or a noun and they make full usage of the two very different meanings the phrase would have to express a more subtle message.

Go literature ! \o/

**zacs7** · 06-01-2007

I agree that it's interesting, but it doesn't really have any other than trivial use.

**Sang-drax** · 06-01-2007

Originally Posted by KONI

You're completely forgetting the fact that natural language is by definition ambiguous.
Sometimes writers don't want you to know if something is a verb or a noun and they make full usage of the two very different meanings the phrase would have to express a more subtle message.

Yeah, but seriously, how common is that?

**Decrypt** · 06-02-2007

All the time, if you consider slang and colloquialisms (sp?)

"He hits a grounder to short."
"I need to code for awhile if I'm ever going to finish this project."

First of all, grounder isn't even a "proper" English word (I think). Second, "short" is usually an adjective; here it's a noun. The word "code" in the second sentence is usually a noun, but, as well all know, it's used as a verb quite often. Using a simple dictionary and assigning each word to a color would never work. One of the things I always hear about English is that it's a hard language to learn because it has so many "exceptions to the rule." Because of that, this is going to be an immense project.

You may have to implement it by looking at the general form, instead. By looking at the sentence structure, I think you can determine that "hits" is a verb because "He" is a pronoun. If "hits" is a verb, then "grounder" must be a noun, since "a" is a determiner (pretty sure, anyway). If all of this is true, we can assume that "short" is a noun, since "to" is a preposition.

I did not really double check to make sure all of that is correct, but you get the idea. Instead of focusing on what each word means, you'd probably have to start with the underlying structure, and determine where the determiners, prepositions, and the like are. Then you can build up to the nouns and verbs. Probably. Maybe.

Since there is a set of rules for grammar, it can probably be done. It will be no small task, though. As an ESL teaching tool, it might be pretty useful, actually, since it focuses on the abstract sentence structure, instead of something like "this word is a noun, except in the following cases..."

**Rashakil Fol** · 06-02-2007

Originally Posted by Decrypt

You may have to implement it by looking at the general form, instead. By looking at the sentence structure, I think you can determine that "hits" is a verb because "He" is a pronoun. If "hits" is a verb, then "grounder" must be a noun, since "a" is a determiner (pretty sure, anyway). If all of this is true, we can assume that "short" is a noun, since "to" is a preposition.

No, there "to short" may also be a verb.

**whiteflags** · 06-02-2007

Originally Posted by Rashakil Fol

No, there "to short" may also be a verb.

No, Decrypt's point about American English is correct. In baseball, the shortstop is sometimes called short (especially by the announcers). In the sentence "He hit a grounder to short," the shortstop is the indirect object. Provided an ESL student has any inkling of what those are, he should be able to understand it is a noun. But this is where the syntax highlighter could come in and clear things up, because to short is not a verb; to shorten is.

Ah, how baseball has impacted American culture....

**Rashakil Fol** · 06-02-2007

Originally Posted by citizen

to short is not a verb; to shorten is.

Play with electricity much? Or stocks?

**whiteflags** · 06-02-2007

Fair enough. Still, there's context to wrestle with and I don't think that particular sentence makes sense in any other way. I could be wrong, but it doesn't go against his real point anyway. The machine would have to be able to understand context to work perfectly, and I think he knows that.

**Decrypt** · 06-03-2007

No, there "to short" may also be a verb.

On it's own it can be, but, in context, I'm not sure that's the case. If we've determined that the sentence so far is <pronoun> <verb> <determiner> <noun>, I'm not sure that a verb in its general form "to short" makes sense, assuming complete sentences. You're right, the sentence could just as easily read "He sells a stock to short," but the sentence is incomplete; another noun is required, I think: "He sells a stock to short it." Due to that, I think something like the above analysis would work. This is English we're talking about, though, so nearly anything is possible.

The idea is that, if this highlighter is to work, I think that it'd have to use general grammatical rules instead of a strict set of uses for each word as mentioned above. However, to start, you'd have to have some set of words whose use is iron-clad, and, in the end, you'd probably have to use both a set of grammatical rules and a database of words and their uses to implement it properly.