Thread: Created a new class, weird performance. To fast in my opinion

  1. #1
    Registered User
    Join Date
    Feb 2009
    Posts
    40

    Created a new class, weird performance. To fast in my opinion

    Hi all. I have just done a class called VirtualFileSystem, but it is a problem, it is too fast in my opinion so can anyone make a fast check and see if anything looks wrong:
    Code:
    class VirtualFileSystem
    {
        private:
        mutable std::ifstream infilestream;
        mutable std::ofstream outfilestream;
        mutable std::multimap<MD5String, std::map<unsigned long long int, unsigned short int>::iterator> hashdata;
        mutable std::map<unsigned long long int, unsigned short int> numberdata;
    
        std::vector<unsigned long long int> remove;
        std::size_t blocksize;
        char* block;
        std::string filename;
        public:
        VirtualFileSystem(const std::string &,unsigned short int,bool);
        ~VirtualFileSystem();
    
        void init(const std::string&);
        void shutdown(const std::string&) const;
    
        unsigned long long int add(const char*,unsigned short int);
    
        unsigned long long int get(const char*, unsigned short int) const;
    
        std::vector<unsigned char> get(const unsigned long long int&) const;
    
        std::size_t GetSize() const;
    
    
    };
    
    
    
    VirtualFileSystem::VirtualFileSystem(const std::string & f, unsigned short int bsize = 8192, bool i = true)
        : filename(f), outfilestream(f.c_str(),std::ios::binary | std::ios::app), infilestream(f.c_str(),std::ios::binary), blocksize(1024*8)
    {
        block = new char[bsize];
        if(i)
        {
            init(f + ".ini");
        }
    }
    VirtualFileSystem::~VirtualFileSystem()
    {
        delete [] block;
        shutdown((filename + ".ini").c_str());
    }
    
    void VirtualFileSystem::init(const std::string& f)
    {
        hashdata.clear();
        std::ifstream stream(f.c_str(),std::ios::binary);
        MD5String md5;
        unsigned long long int num;
        unsigned short int length;
        std::multimap<unsigned long long int, unsigned short int>::iterator it;
        while(stream >> md5 >> num >> length)
        {
            numberdata[num] = length;
            hashdata.insert(std::make_pair(md5,numberdata.find(num)));
        }
    }
    
    void VirtualFileSystem::shutdown(const std::string& f) const
    {
        std::ofstream stream(f.c_str(),std::ios::binary | std::ios::trunc);
    
        std::multimap<MD5String, std::map<unsigned long long int, unsigned short int>::iterator>::iterator it;
        std::map<unsigned long long int, unsigned short int>::iterator it2;
        for(it = hashdata.begin(); it != hashdata.end(); it++)
        {
            it2 = (*it).second;
            stream << (*it).first << " " << (*it2).first << " " << (*it2).second;
        }
        outfilestream << std::flush;
    }
    
    unsigned long long int VirtualFileSystem::get(const char* data, unsigned short int size) const
    {
        if(size > blocksize)
        {
            return 0;
        }
        std::multimap<MD5String, std::map<unsigned long long int, unsigned short int>::iterator>::iterator it;
        std::map<unsigned long long int, unsigned short int>::iterator it2;
        MD5String md5(data,size);
        for(it = hashdata.find(md5); it != hashdata.end(); it++)
        {
            it2 = (*it).second;
            if(size != (*it2).second)
            {
                continue;
            }
            infilestream.seekg(blocksize*((*it2).first-1));
    
            infilestream.read(block,size);
            if(std::memcmp(block,data,size) == 0)
            {
                return (*it2).first;
            }
        }
        return 0;
    }
    
    unsigned long long int VirtualFileSystem::add(const char* data,unsigned short int size)
    {
        unsigned long long int ret = get(data,size);
        if(ret > 0)
        {
            return ret;
        }
        bool usedremove = false;
        if(remove.size() > 0)
        {
            ret = remove[0];
            usedremove = true;
        }
        else
        {
            infilestream.seekg(0,std::ios::end);
            ret = (float)(infilestream.tellg())/(float)blocksize + 1;
        }
        if(outfilestream.seekp(blocksize*(ret-1)) && outfilestream.write(data,size))
        {
            outfilestream.write(block,blocksize-size);
            numberdata[ret] = size;
            hashdata.insert(std::make_pair(MD5String(data,size),numberdata.find(ret)));
            if(usedremove)
            {
                remove.erase(remove.begin());
            }
        }
        outfilestream << std::flush;
        return ret;
    }
    
    std::vector<unsigned char> VirtualFileSystem::get(const unsigned long long int &pos) const
    {
        std::map<unsigned long long int, unsigned short int>::iterator it(numberdata.find(pos));
        if(it == numberdata.end())
        {
            return std::vector<unsigned char>();
        }
        infilestream.seekg(blocksize*(pos-1));
        infilestream.read(block,(*it).second);
    
        std::vector<unsigned char> ret(block,block + (*it).second);
        return ret;
    }
    
    
    inline std::size_t VirtualFileSystem::GetSize() const
    {
        return hashdata.size();
    }
    The MD5String class look like this if anyone really want to now:
    Code:
    /*
    uses http://www.md5hashing.com/c++/
    */
    class MD5String
    {
        private:
        unsigned char md5str[16];
        static MD5 md5_po;
        public:
        MD5String(const char*, std::size_t);
        MD5String(const MD5String &);
        ~MD5String();
    
    
        void Update(const char*, std::size_t);
    
        MD5String & operator=(const MD5String&);
    
        friend std::ostream & operator<<(std::ostream &, const MD5String&);
        friend std::istream & operator>>(std::istream &, MD5String&);
    
        friend bool operator==(const MD5String &, const MD5String&);
        friend bool operator<(const MD5String &, const MD5String&);
        friend bool operator>(const MD5String &, const MD5String&);
    };
    MD5 MD5String::md5_po = MD5();
    
    MD5String::MD5String(const MD5String & str)
    {
        std::memcpy(md5str,str.md5str,16);
    }
    MD5String::MD5String(const char* data = 0, std::size_t size = 0)
    {
        if(data != 0)
        {
            Update(data,size);
        }
    }
    
    MD5String::~MD5String()
    {
    
    }
    
    MD5String & MD5String::operator=(const MD5String& str)
    {
        if(this == &str)
        {
            return *this;
        }
        std::memcpy(md5str,str.md5str,16);
        return *this;
    }
    std::ostream & operator<<(std::ostream & stream, const MD5String& str)
    {
        stream.write((char*)str.md5str,16);
        return stream;
    }
    std::istream & operator>>(std::istream & stream, MD5String& str)
    {
        stream.read((char*)str.md5str,16);
        return stream;
    }
    bool operator==(const MD5String & str1, const MD5String& str2)
    {
        return memcmp(str1.md5str,str2.md5str,16) == 0;
    }
    
    bool operator<(const MD5String & str1, const MD5String& str2)
    {
        return memcmp(str1.md5str,str2.md5str,16) == 1;
    }
    
    bool operator>(const MD5String & str1, const MD5String& str2)
    {
        return memcmp(str1.md5str,str2.md5str,16) == -1;
    }
    
    void MD5String::Update(const char* data, std::size_t size)
    {
        MD5_CTX ctx;
        md5_po.MD5Init(&ctx);
    	md5_po.MD5Update(&ctx,(unsigned char*)data,size);
    	md5_po.MD5Final(md5str,&ctx);
    }

    And besides, is this the "right" way to build anything like this? Is there anyway I can make this class even faster which is not so hard to implement?

    Thanks in advance.

  2. #2
    The larch
    Join Date
    May 2006
    Posts
    3,573
    How can something be too fast? Does it or does it not do what it is supposed to do (assuming you suspect it is too fast as it doesn't do its job)?
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  3. #3
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by anon
    How can something be too fast?
    Implementations of cryptosystems that allow for timing attacks

    But yeah, it would be good to say more precisely how does it not work. Just dumping code and expecting people to dissect it is rather unrealistic when the code is relatively long.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  4. #4
    Registered User
    Join Date
    Feb 2009
    Posts
    40
    Well, I am sorry.

    The thing is that I have put 13.8 G data into the file and in 1 second so can I get 325099520 bytes back, it is if I am counting right 325099520/(1024^2) = 310 M per sec. My hard drive is not so fast as I know. I did expect something about 70 Megabyte per sec at maximum. I simply don't understand how it can return 310 M per sec.

  5. #5
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by idleman View Post
    Well, I am sorry.

    The thing is that I have put 13.8 G data into the file and in 1 second so can I get 325099520 bytes back, it is if I am counting right 325099520/(1024^2) = 310 M per sec. My hard drive is not so fast as I know. I did expect something about 70 Megabyte per sec at maximum. I simply don't understand how it can return 310 M per sec.
    You said it was a virtual filesystem. Are you accessing the disk or not? I don't want to dissect all that code without more information.

    If you're writing to a virtual FS in RAM, I think 310 MB/sec is perfectly reasonable.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  6. #6
    Registered User
    Join Date
    Feb 2009
    Posts
    40
    No, I don't. I just called it VirtualFileSystem because I actually didn't know what to name it else. Should everything being read from a real virtual FS so should I be surprised over how slow it is. I have heard such networks/filesystems should be able to read and write data in 800 MB/sec or more ;P

    But, do anyone have any tips how I can increase the performance of the class? Do anyone now a hash function which returns a unsigned long long int? I think it should be much better than using md5 like I do now.


    Thanks

  7. #7
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Sorry, I can't help you with the specific questions you ask, but:
    Code:
    bool operator<(const MD5String & str1, const MD5String& str2)
    {
        return memcmp(str1.md5str,str2.md5str,16) == 1;
    }
    
    bool operator>(const MD5String & str1, const MD5String& str2)
    {
        return memcmp(str1.md5str,str2.md5str,16) == -1;
    }
    You really should not rely on memcmp() returning 1 or -1 - the function is only guaranteed to return something greater than zero or less then zero if the memory is not equal - it is not unusual for such functions to return the difference between the first and the second byte of memory that is different. Which is not 1 or -1.

    By the way, what method did you use to measure the time? If you use clock(), and you are on a Linux/Unix system, then you are measuring the consumed CPU time, and it's not impossible that the processing of your filesystem can be done in one second for that amount of data on a modern CPU - but it would take a lot longer in seconds of wallclock time.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Class design problem
    By h3ro in forum C++ Programming
    Replies: 10
    Last Post: 12-19-2008, 09:10 AM
  2. Message class ** Need help befor 12am tonight**
    By TransformedBG in forum C++ Programming
    Replies: 1
    Last Post: 11-29-2006, 11:03 PM
  3. Replies: 8
    Last Post: 07-24-2006, 08:14 AM
  4. gcc problem
    By bjdea1 in forum Linux Programming
    Replies: 13
    Last Post: 04-29-2002, 06:51 PM
  5. Difficulty superclassing EDIT window class
    By cDir in forum Windows Programming
    Replies: 7
    Last Post: 02-21-2002, 05:06 PM