Thread: strspn problems with '#' character.

  1. #16
    Registered User
    Join Date
    Apr 2021
    Posts
    140
    Consider this:

    1. Create a const, static table of character mappings. It would map the entire 8-bit character range to either itself or '.'

    2. On entry, loop over each character in the input string. Pass the input chars through the table.

    3. If the table value is '.', then

    3a. If TV, then call the add_character function as now.

    3b. If no TV, store the dot.

    4. If the table value is not '.', just store the result.


    This brings two questions to mind. First, what is your policy for "out of range" characters, like all the ASCII values below space? Is it okay to have a filename with newlines or tabs in it? Second, what is the status of dot in filenames? If you have a filename like foo.txt, is the dot going to be added to the SPECIAL characters list? Or is dot already handled before this function is called?

  2. #17
    Registered User
    Join Date
    Apr 2019
    Posts
    121
    Quote Originally Posted by aghast View Post
    This brings two questions to mind. First, what is your policy for "out of range" characters, like all the ASCII values below space? Is it okay to have a filename with newlines or tabs in it? Second, what is the status of dot in filenames? If you have a filename like foo.txt, is the dot going to be added to the SPECIAL characters list? Or is dot already handled before this function is called?
    If the questions are from you wanting me to 'consider this', then maybe you should answer them.

    As for non-displayed ASCII characters, how many files have you downloaded or created that have non-displayed ASCII characters in them? And why are you trying to create them that way?!?

    And the period, or 'dot' as you put it, in the filenames are interpreted as spaces. And a period can be encoded for things like `Mr.`, `vs.` or even '...'. Extensions are ignored.

    Quote Originally Posted by aghast View Post
    is the dot going to be added to the SPECIAL characters list
    It's not in the list, and no it won't be added.

  3. #18
    Registered User
    Join Date
    Apr 2021
    Posts
    140
    Quote Originally Posted by Yonut View Post
    If the questions are from you wanting me to 'consider this', then maybe you should answer them.

    As for non-displayed ASCII characters, how many files have you downloaded or created that have non-displayed ASCII characters in them? And why are you trying to create them that way?!?

    And the period, or 'dot' as you put it, in the filenames are interpreted as spaces. And a period can be encoded for things like `Mr.`, `vs.` or even '...'. Extensions are ignored.

    It's not in the list, and no it won't be added.
    I cannot answer the questions because they are policy questions, not technical ones. Nothing I have shown you will fail to work depending on how you decide - the questions are just things that invite a "management" decision rather than a "programming" decision.

    Non-displayed ASCII characters are such a problem that the GNU project has modified the CLI of many standard utilities to support accepting filenames delimited with ASCII NUL bytes, in an attempt to handle every possible flavor of filename. (Except, obviously, those with NULs in. ;-) Look for utilities that take a -0 (dash zero) flag.

    Handling the periods as spaces is fine, that would fit into the lookup table without any extra effort.

    Something like this:

    Code:
    #define DOT(ch)  [ch] = '.'
    #define SELF(ch) [ch] = (ch)
    #define NOP(ch) [ch] = 0
    
    static char Fix_chars[256] = {
    // Non-graphic ASCII chars
    NOP(0), NOP(1), NOP(2), /* lots of these */
    SELF(' '), ... SELF('A'), ... SELF('Z'), ... SELF('a'), ... SELF('z'),
    // Specials
    DOT('\''), DOT(','), DOT(';'), DOT('"'), ..., DOT('@'),
    // High-bit chars? I have no idea.
    };
    
    
    int
    fix_filename(
        const char *filename,
        int for_tv)
    {
        char * new_name = sm_malloc(PATH_MAX);
    
        for (char *in = filename, *out=new_name; *in != `\0'; ++in) {    
            int fix = Fix_chars[*in];
            switch (fix) {
            case 0: // skip the character entirely
                break;
            case '.': // special processing
                *out = '\0'; // end string here so add_character can find it
                add_character(*in, new_name);
                // Move forward to end-of-string, which might be *out already,
                // if add_character() adds nothing.
                while (*out != '\0')
                    ++out;
                break;
            default:
                *out++ = fix;
                break;
            }
        }
        *out = '\0';
        // optionally shrink allocated buffer
        char *n2 = realloc(new_name, strlen(new_name) + 1);
        return n2 ? n2 : new_name;
    }

  4. #19
    Registered User
    Join Date
    Apr 2019
    Posts
    121
    Quote Originally Posted by aghast View Post
    Code:
    #define DOT(ch)  [ch] = '.'
    ...
    Don't remember asking people to do something for me. In fact, the OP was about how `strspn` has trouble handling the pound symbol `#`.

    And for the 'management' portion of your code, what happens when the file starts with a special character, and it's changed to a period, which makes the file hidden? Also each folder has at least one hidden file which lists the last played file in a given directory. So I don't display hidden files, which mean I would eventually lose storage space to files that just disappear.

    It's nice you want to help me by giving me a fish. That's not my style. There are two types of people on the internet offering help... thinkers and regurgitators. Thinkers will help you understand how to fix the problem, regurgitators will just give you the answer. Because you don't have a proper understand of the full scope of my problem, you can't possibly offer me a completed solution.

    But thanks?

  5. #20
    Registered User
    Join Date
    May 2012
    Location
    Arizona, USA
    Posts
    948
    Quote Originally Posted by Yonut View Post
    what happens when the file starts with a special character, and it's changed to a period, which makes the file hidden?
    Are you renaming the files themselves, or just displaying their names in some sort of user interface? If the former, I'd be careful not to overwrite existing files. For example, if you have two files "foo@bar" and "foo.bar", if you rename the first file it'll clobber the second one. If the latter, then it's up to your application whether it hides a file or not. If the actual filename doesn't start with a period, then don't hide it. Period.

  6. #21
    Registered User
    Join Date
    Apr 2019
    Posts
    121
    Quote Originally Posted by christop View Post
    Are you renaming the files themselves, or just displaying their names in some sort of user interface? If the former, I'd be careful not to overwrite existing files. For example, if you have two files "foo@bar" and "foo.bar", if you rename the first file it'll clobber the second one. If the latter, then it's up to your application whether it hides a file or not. If the actual filename doesn't start with a period, then don't hide it. Period.
    Why does it have to be the same process? Can't one fix the names and another display them.

    And I'm using `stat` before `rename` with the new filename, to make sure it isn't replacing an existing file. Why would I have included it in my short example that displays my original problem?

    I don't know about you, but sometimes I want to make a filename safe without obscuring what the original filename was. Getting rid of leading periods would hide the original filename. Instead, the smart thing would be, because this can be an automated program, would be to change the files with a leading period, but warn the user (or in a log file) about that instance. Then the file is safe, the file can be reverted back to the original filename, and it wouldn't be lost because there'd be a record of it's existence and location.

    For the record, the original problem has been solved, and I just seem to be explaining things that aren't important. I may not return to this post. But thanks to those that helped.

  7. #22
    Registered User
    Join Date
    May 2012
    Location
    Arizona, USA
    Posts
    948
    Quote Originally Posted by Yonut View Post
    I don't know about you, but sometimes I want to make a filename safe without obscuring what the original filename was.
    OK, but I still don't understand what you mean by "safe". All characters that you might find in a file name are "safe" (otherwise the OS wouldn't have let you use them in a file name in the first place).

    I just assumed you wanted to escape certain characters for HTML or some other user interface that treats those characters as special. That makes more sense to me than renaming a file from one safe name to another safe name. So if you're renaming files to make them safe (for some definition of "safe") then you're probably solving the problem in the wrong place, and the real problem is how the file names are handled in the shell (or a shell script). At least in Bash (and other Bourn shell derivatives) It's easy enough to handle any file name properly:

    Code:
    ls -l "$filename"
    That will work for any value of $filename, whether it has a pound sign or an asterisk or a linefeed or what have you.

  8. #23
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    Aren't you forgetting about charsets? On SysV systems is common the use of UTF-8, so a filename like ".txt" is a valid name. It works, including, on Windows (where a modified version of UTF-16 is used)...

    The only thing "safe" about filenames is that, in C/C++, they end with '\0'.

  9. #24
    Registered User
    Join Date
    Apr 2019
    Posts
    121
    Quote Originally Posted by christop View Post
    Quote Originally Posted by Yonut View Post
    I don't know about you, but sometimes I want to make a filename safe without obscuring what the original filename was.
    OK, but I still don't understand what you mean by "safe". All characters that you might find in a file name are "safe" (otherwise the OS wouldn't have let you use them in a file name in the first place).
    Really?!? So an exclamation mark (!) in a filename is safe to you? Or it's safe to you because you either quoted it or escaped the offending character. A safe filename does not need to be quoted, or escaped and can be directly sent to a windows machine (if needed) without any problems with special characters on different OSs. When something is safe, you don't have to do anything to it or worry about it, ever.


    Quote Originally Posted by christop View Post
    That makes more sense to me than renaming a file from one safe name to another safe name.
    Why would I rename a safe filename to another safe filename?!? Am I being trolled here or is this a legitimate question to you?!?


    Quote Originally Posted by christop View Post
    So if you're renaming files to make them safe (for some definition of "safe") then you're probably solving the problem in the wrong place, and the real problem is how the file names are handled in the shell (or a shell script). At least in Bash (and other Bourn shell derivatives) It's easy enough to handle any file name properly
    Ahh you mean by quoting it. Yes, taking that extra step would make it an unsafe filename 'safe'. And that would work for Bash, but I'm pretty sure I didn't come to a c programming forum because I'm writing this in Bash.


    Quote Originally Posted by christop View Post
    That will work for any value of $filename, whether it has a pound sign or an asterisk or a linefeed or what have you.
    You sure like to put ridiculous characters in your filenames. Glad I have a way to guard against it.

    Just out of curiosity, are you a Millennial or younger?

  10. #25
    Registered User
    Join Date
    May 2012
    Location
    Arizona, USA
    Posts
    948
    Spaces in a file name need to be escaped or quoted when used in a shell. Do you consider spaces to be "unsafe"? Are spaces "special"?

    Quote Originally Posted by Yonut View Post
    Just out of curiosity, are you a Millennial or younger?
    Why would my age matter? I've been putting spaces in file names since I've started using *nix in the '90s, and I've never thought of them as "unsafe". I also used spaces in file names back when I still used Windows (which has supported spaces and long file names since '95). I never thought of them as unsafe in Windows either. Otherwise, Windows folders like "Documents and Settings" and "Program Files" would be considered "dangerous".

    My Unix systems professor in college threw a fit when I wrote defensive shell scripts with quotes around file name variables. My rationale is that any user might use my script, even someone migrating to/from Windows, where a file name might look like "/mnt/to/windows/Documents and Settings/Joe Bob/Cat Video!.wmv". Better to support "unsafe" characters than to force a user to rename their files only because your script can't handle them.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 4
    Last Post: 04-01-2016, 11:22 AM
  2. Problems reading in character
    By jjohan in forum C Programming
    Replies: 8
    Last Post: 09-11-2014, 01:45 AM
  3. Problems with character input
    By OmnipotentCow in forum C Programming
    Replies: 19
    Last Post: 06-20-2003, 03:39 PM
  4. strspn()
    By JDMac in forum C Programming
    Replies: 4
    Last Post: 11-04-2002, 04:01 PM
  5. int StrSpn(char *str, int ch);
    By Krush in forum C Programming
    Replies: 5
    Last Post: 11-01-2002, 07:13 PM

Tags for this Thread