Originally Posted by
RagingOrc
An example of a command parsed could be
ldc 5
a more complicated command could be something like...
Load Accum: ldc 5 ; a label and an instruction
[/CODE]
You could use a regular expression engine. First write out some sample commandlines that might be interpreted by your assembler:
Code:
label1: mov ax 4 ; here is a comment
mov bx 123;here is another comment
a:push 456 ;...
Generalize each possible commandline with the different combinations of components that might occur:
(1) LABEL: CMD ARG1 ARG2 ;COMMENT
(2) CMD ARG1 ARG2 ;COMMENT
(3) LABEL: CMD ARG1 ;COMMENT
...
(N)
If a commandline matches any of these N cases, it is a valid commandline. Otherwise it is invalid.
Translate each case 1..N into a regular expression and use capture groups to pick out each component you are interested in. Here is a pseudocode outline:
Code:
// (Case 1) LABEL: CMD ARG1 ARG2 ;COMMENT
if Commandline matches regexp ^\s*(\w+):\s*(\w+)\s+(\w+)\s+(\w+)\s*;(.*)$
Label := capture[1]
Cmd := capture[2]
Arg1 := capture[3]
Arg2 := capture[4]
Comment := capture[5]
else
// (Case 2) CMD ARG1 ARG2 ;COMMENT
if Commandline matches regexp ^\s*(\w+)\s+(\w+)\s+(\w+)\s*;(.*)$
Label := NONE
Cmd := capture[1]
Arg1 := capture[2]
Arg2 := capture[3]
Comment := capture[4]
else
// ...
else
print "Invalid commandline: ", Commandline
end if
There are several regex libraries which provides these matching abilities and are callable from C. An example is PCRE.