A program I'm working on has records (structs), with an array of ints for every record, with 52 integers. So it's:
The BIG part is the number of records going into an array of these - at least 100 million have to be processed, in several large "chunks".Code:struct person { char name[30]; int data[50]; }
My problem, is in the sorting - the data integers can't be moved from one index to the next - their position within the data array of each record, is fixed. The whole record has to be sorted, with the key being the first column, then the second column, then the third column, etc. Time and memory constraints are obviously a consideration, but it will be running on strong desktop or server class systems.
The first 50 records of Data look like this, when sorted properly:
Records sorted: 1,000,000 Columns: 52 Int's limited to 999 here, but will range up to 300,000 at least, in the actual data.Code:0: 0 1 474 753 478 945 456 255 1 198 847 945 551 24 396 986 242 85 154 1: 0 3 299 465 382 737 460 803 301 771 179 50 801 262 483 330 815 905 349 2: 0 3 345 44 796 23 873 36 883 551 419 744 744 659 336 470 891 699 16 3: 0 3 849 827 747 316 243 938 517 197 597 700 653 425 613 307 298 446 293 4: 0 3 925 53 709 784 756 745 423 178 670 93 311 837 321 362 612 460 455 5: 0 4 873 86 923 163 182 771 644 454 353 918 845 682 210 997 429 755 772 6: 0 6 417 387 904 42 848 60 345 464 642 849 576 143 770 716 331 564 186 7: 0 6 832 185 660 326 499 338 910 34 978 664 625 260 107 298 186 979 132 8: 0 8 348 612 127 559 495 254 805 122 677 394 373 613 243 810 686 111 285 9: 0 10 154 192 615 536 896 45 15 445 418 637 784 145 829 849 351 313 243 10: 0 11 522 647 514 445 715 990 98 406 592 955 431 266 864 113 107 958 138 11: 0 12 333 455 930 249 427 629 609 426 967 981 928 488 462 97 778 363 331 12: 0 12 361 19 428 960 195 48 826 329 672 520 579 620 729 422 291 659 187 13: 0 16 767 659 471 118 680 993 124 49 583 252 279 107 962 435 192 486 419 14: 0 16 966 710 733 518 635 786 823 48 252 343 111 925 984 362 707 572 591 15: 0 17 654 931 719 475 350 774 603 248 61 441 451 212 806 450 635 904 114 16: 0 19 691 146 561 202 679 1 752 965 701 25 437 870 898 537 858 575 464 17: 0 20 976 306 427 291 532 412 19 157 639 885 976 321 56 268 570 518 302 18: 0 21 193 144 243 660 18 852 615 723 330 703 663 576 50 298 584 875 131 19: 0 22 398 74 875 882 38 249 931 199 732 32 154 573 960 950 950 909 364 20: 0 22 405 307 198 711 412 774 394 432 396 319 93 783 994 134 963 504 622 21: 0 24 115 423 464 806 1 457 739 253 135 909 456 590 381 893 357 742 608 22: 0 25 450 853 240 990 249 169 798 10 533 545 534 449 522 559 104 926 153 23: 0 25 463 335 908 489 405 391 51 640 763 720 221 598 413 634 383 690 243 24: 0 27 533 441 604 117 202 83 94 775 308 738 551 248 638 733 393 156 543 25: 0 27 796 943 401 592 894 783 425 326 940 523 286 8 198 229 636 702 509 26: 0 28 110 295 95 114 66 871 57 989 475 766 940 929 759 376 940 943 688 27: 0 29 557 109 343 650 105 219 889 460 148 18 183 213 313 700 998 231 515 28: 0 30 296 196 704 905 178 308 377 51 542 352 819 922 371 855 996 923 542 29: 0 32 442 885 57 918 597 860 546 727 783 814 97 67 365 817 839 852 74 30: 0 32 771 597 742 99 255 632 643 316 317 644 298 256 532 41 594 638 565 31: 0 33 829 853 851 720 993 822 6 615 207 313 566 822 317 69 665 429 27 32: 0 33 920 848 920 180 651 515 264 420 461 143 67 827 522 216 382 989 748 33: 0 35 130 640 960 325 881 218 315 962 453 390 269 124 489 881 648 291 888 34: 0 36 149 297 104 248 995 599 644 505 938 811 788 485 302 425 807 345 124 35: 0 38 145 488 51 719 269 136 572 914 193 874 845 189 351 786 411 689 6 36: 0 38 837 687 725 958 108 502 753 944 794 298 799 326 428 118 636 828 392 37: 0 39 17 818 886 43 629 135 656 657 563 340 437 664 281 735 105 870 434 38: 0 39 215 110 391 412 970 531 564 589 967 247 158 259 569 331 14 598 607 39: 0 39 312 757 916 764 84 497 901 660 18 99 268 832 946 102 857 302 966 40: 0 39 735 769 860 92 76 150 852 699 646 618 938 542 895 829 256 258 225 41: 0 41 996 250 385 420 35 814 80 551 931 869 269 322 467 955 265 367 652 42: 0 42 600 55 374 355 294 176 893 338 284 30 887 894 742 663 411 157 589 43: 0 43 668 426 329 264 299 577 309 623 788 447 224 71 933 250 800 474 8 44: 0 43 989 71 332 696 433 770 549 501 581 131 731 51 62 243 609 396 285 45: 0 44 485 672 957 33 198 423 190 280 32 379 907 409 854 971 399 827 798 46: 0 46 565 684 262 715 952 357 769 142 1 865 681 928 930 606 817 346 441 47: 0 47 282 598 719 73 363 76 541 103 96 96 224 104 926 446 330 17 861 48: 0 47 389 253 895 940 753 57 486 288 265 443 918 581 678 782 572 132 293 49: 0 47 559 174 135 228 645 777 669 345 798 491 742 553 907 401 179 818 539 50: 0 48 840 318 31 66 518 893 533 158 922 718 419 981 538 777 462 363 702
This takes me about 18 seconds, (on an i7) with Quicksort doing the first column, but since it's not aware of the fixed data constraint, I have a custom code block that handles all the rest of the columns data.
I can't use an algorithm that requires a large amount of auxiliary memory. "Large" is not defined, however.
I tried changing the data integers into strings, and sorting them that way - the times were no better, however. (using an optimized Quicksort + Insertion sort). Radix should do a better job, but Radix has some peculiarities that can make me review my list of 4 letter words.
Column sorting like this is new to me, and Google searches have not been helpful. It's not the same thing as a "stable" sort that's needed here. If you know the correct terminology for this, please let me know. It's a multi-key sort, with 51 sub keys, I guess.
Any algorithmic tips? I'm not looking for code.
Thanks, and Best Wishes to all in 2013!