Thread: xml file parsing in C

  1. #31
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    You could try an HTML parser. They're good at broken stuff.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  2. #32
    Registered User
    Join Date
    Nov 2006
    Posts
    65
    I think there are basically 4 special characters
    The XML file claims to be iso-8859-1. You cannot just use it as if it were UTF-8. In iso-8859-1 all characters are made up of one byte. In UTF-8, the non-ASCII characters (i.e. letters with accent marks / diacritics, like the ones you posted) are made up of 2 bytes. So in iso-8859-1, if you know the hex value of the characer you are looking for, you can always easily match it, as long as you don't play around with the encoding.

    You can try what Salem posted, but I don't think it will work very well, because the dash ( - ) is only supposed to be used for simple sequences (like a-g for example) as far as I know. Here's some more things you can try:
    Code:
    $tag_contents =~ tr/A-Za-z0-9\.\_ / /c;
      # This will space out all characters except those listed! Add to the list anything you need
      #  to keep.
    Or, to do things more correctly, you can use Encode/Decode. ( http://perldoc.perl.org/Encode.html ) First decode data from iso-8859-1. Then you should be able to match all your special characters using their unicode if you like. But also, if you decode data from iso-8859-1 and then encode it as ASCII, anything non ASCII should automatically be replaced with "?". Instead of using Decode explictly, you can try to just change the calls to open if you use PerlIO.
    Code:
    open my $in,  "<:encoding(iso-8859-1)", $infile  or die;
    open my $out, ">:encoding(ascii)",   $outfile or die;
    # See Docs

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Data Structure Eror
    By prominababy in forum C Programming
    Replies: 3
    Last Post: 01-06-2009, 09:35 AM
  2. Inventory records
    By jsbeckton in forum C Programming
    Replies: 23
    Last Post: 06-28-2007, 04:14 AM
  3. Unknown Memory Leak in Init() Function
    By CodeHacker in forum Windows Programming
    Replies: 3
    Last Post: 07-09-2004, 09:54 AM
  4. System
    By drdroid in forum C++ Programming
    Replies: 3
    Last Post: 06-28-2002, 10:12 PM
  5. Need a suggestion on a school project..
    By Screwz Luse in forum C Programming
    Replies: 5
    Last Post: 11-27-2001, 02:58 AM