Thread: Search Engine C++ program

  1. #1
    Registered User
    Join Date
    Jan 2007
    Posts
    1

    Search Engine C++ program

    I'm looking to write a program that takes entries from a user and passes them to web search engines and returns the results to the program. Basically I've never worked with a C or C++ program that accessed any internet informion and so I don't know where to begin. All the programmign I've done revolved around math, science, and graphics...so I'm kinda ata loss as to how to do something like this. I'm more than happy to tweak with it myself...but I don't know where to look for reference materials. If someome could point me to some tutorials or reference materials that I could learn what is availble it would be much appreciated.

  2. #2
    Hurry Slowly vart's Avatar
    Join Date
    Oct 2006
    Location
    Rishon LeZion, Israel
    Posts
    6,788
    You may want to read this: http://code.google.com/enterprise/do...reference.html

    Then try to build a program that generates a simple HTTP request using suggested samples and parses the results returned by the site
    All problems in computer science can be solved by another level of indirection,
    except for the problem of too many layers of indirection.
    – David J. Wheeler

  3. #3
    Registered User
    Join Date
    Apr 2003
    Posts
    2,663
    Windows Sockets Programming

    This page is designed to give the programmer enough information to know where to look to get the actual information.

    What is Windows Sockets?
    Windows Sockets, or Winsock, is a .DLL which allows applications to talk over a network, usually the Internet. The .DLL is usually called WINSOCK.DLL.
    http://www.snible.org/winsock/

    Programs can communicate with each other in a variety of ways. They can use files, anonymous/named pipes, System V interprocess messaging primitives, BSD sockets, and TLI (Transport Layer Interface). Socket and TLI communications come under the purview of "networking," a step up from the other IPC (interprocess communication) mechanisms, because they don't constrain the communicating processes to be on the same machine. ...

    Mail (paper and electronic) and telephones are two distinct forms of communication. A telephone conversation is connection-oriented, because the caller and the called "own" the line (have a continuous link) until the end of the conversation. Connection-oriented communication guarantees message delivery, preserves the order in which messages are sent, and allows a stream of data to be sent. Mail, in contrast, is a connectionless mode of transfer, which transports information in packets (or datagrams) and gives no guarantees about message delivery and the order in which the packets are received. It has a higher overhead because each packet identifies its sender and the intended receiver; in contrast, a connection-oriented conversation proceeds without further ado, once the parties have identified themselves. Computer networks offer you a similar choice of connection versus connectionless mode of data transfer. It must be mentioned that there are connectionless protocols such as reliable UDP that do offer guaranteed delivery and sequence integrity.

    The networking world assigns each computer an internet address, also called an IP address (short for Internet Protocol), a sequence of four bytes typically written in a dot sequence, like this: 192.23.34.1. (This will change with IPv6, because the world is fast running out of four-byte IP addresses.) Just as you have convenient phone aliases such as 1-800-FLOWERS, computers are often given unique aliases, such as www.yahoo.com. Now, many programs can run on one machine, and it is not enough to deliver a message to the machine: it has to be handed over to the appropriate application program running on that machine. A program can ask for one or more ports to be opened, the equivalent of a private mailbox or telephone extension. To send a message to a program, you need its full address: its machine name and the port on which it is listening. Standard applications such as ftp, telnet, and mail actually come in pairs; for example, the ftp program you use talks to a counterpart server program called ftpd (ftp daemon) on the remote computer. Such server programs listen on standard port numbers; when you type www.yahoo.com on your web browser, the browser automatically connects to port 80 on that machine, where it assumes the corresponding web server to be listening. Port numbers 1-1024 are reserved for standard, well-known Internet applications. Many platforms reserve the name "localhost" (and the address 127.0.0.1) to mean the machine on which the program is running.

    Once assigned a socket, your program has a choice of using a connection-oriented protocol called TCP/IP (Transport Control Protocol/IP) or a connectionless one, UDP/IP (User Datagram Protocol). Clearly, sender and receiver must use the same protocol. The TCP/IP model is usually preferred over UDP because it provides for data sequencing, end-to-end reliability (checksums, positive acknowledgments, time-outs), and end-to-end flow control (if the sender is sending data faster than the receiver can handle it, it will block the sender when the receiver's buffers are full). If the communications medium is very good, such as a LAN, UDP may perform much better because it doesn't spend time accounting for the worst case. In a production system, however, you can never really take a chance, so we will stick to TCP in this chapter.

    The socket abstraction and API were introduced in BSD 4.2 to provide a uniform interface over different types of protocols (there are others besides TCP and UDP), and, depending on the protocol used, a socket behaves like either a telephone receiver or a mailbox. In either case, it takes one socket on each side to make a conversation (which is why sockets are also known as communications end-points). The socket API allows you to specify the domain of the communicating entities - the "Unix domain" is used for processes on the same machine, and the "Internet domain" is used for processes on different machines. This chapter examines the more generally accepted (and useful) "Internet domain" option.

    TLI (Transport Layer Interface), another API introduced in System V (Release 3.0, 1986), provides a very similar-looking alternative to the socket abstraction, but because it is not as widely used as the BSD socket interface, we will not discuss it in this chapter.
    Advanced Perl Programming, O'Reilly

    Search around for "sockets" and "network programming". If you find a simple good tutorial, post a link.
    Last edited by 7stud; 01-09-2007 at 02:38 AM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Writing a Unix Search Program, some questions
    By Acolyte in forum C Programming
    Replies: 3
    Last Post: 09-23-2008, 12:53 AM
  2. Search Engine Speed
    By Anator in forum C++ Programming
    Replies: 1
    Last Post: 02-26-2008, 01:56 AM
  3. Replies: 0
    Last Post: 05-19-2004, 06:11 AM
  4. Your favourite search engine?
    By ammar in forum A Brief History of Cprogramming.com
    Replies: 20
    Last Post: 12-03-2002, 12:43 PM
  5. search array program
    By z.tron in forum C++ Programming
    Replies: 3
    Last Post: 11-15-2002, 07:33 AM