Thread: Requesting page source code with HttpWebRequest => Unable to connect

  1. #1
    Registered User
    Join Date
    Aug 2011
    Location
    Montreal, Quebec, Canada
    Posts
    73

    Requesting page source code with HttpWebRequest => Unable to connect

    Hey there. I am trying to do a little bit of data extraction on a few websites for my personal usage (no it's not some kind of spam bot) and I managed to extract data from one website but when trying to connect to another website I get error messages I don't really understand. Any help ?

    Code:
    public abstract class DataFetcher {
    	public List<Serie> Contents;
    	public List<KeyValuePair<Serie, int>> Followed;
    	
    	protected string SourceCode;
    	protected string Identifier;
    	
    	public void FetchSourceCode(string url) {
    		StringBuilder sb  = new StringBuilder();
    		byte[]        buf = new byte[8192];
    		
    		Logger.ToLog("Requesting '" + url + "'");
    		HttpWebRequest  request  = (HttpWebRequest) WebRequest.Create(url);
    		HttpWebResponse response = (HttpWebResponse) request.GetResponse();
    
    		Stream resStream = response.GetResponseStream();
    
    		string tempString = null;
    		int    count      = 0;
    
    		Console.WriteLine("Attempting to read webpage ...");
    		bool read = false;
    		do {
    			count = resStream.Read(buf, 0, buf.Length);
    
    			if (count != 0) {
    				if(read == false) { read = true; Console.WriteLine("Reading..."); }
    				tempString = Encoding.ASCII.GetString(buf, 0, count);
    				sb.Append(tempString);
    			}
    		} while (count > 0);
    		
    		SourceCode = sb.ToString();
    	}
    	
    	public void GetContentsFromFile(string path = "") {
    		// irrelevant
    	}
    	
    	public void LogContents() {
    		StreamWriter w = new StreamWriter(Identifier + ".txt");
    		foreach (Serie i in Contents) {
    			w.WriteLine(i.Name + ';' + i.Quantity.ToString() + ';' + i.URL + ';' + (i.Follow == false ? '0' : '1'));
    		}
    	}
    	
    	public DataFetcher() {
    		SourceCode = "";
    		Contents = new List<Serie>();
    		Followed = new List<KeyValuePair<Serie, int>>();
    	}
    	
    	protected abstract bool NextSerie(string contents, ref int lastIndexUsed);
    	public abstract List<Serie> GetContents(string source = "");
    	public abstract List<KeyValuePair<Serie, int>> CheckNewEpisodesFollowed();
    }
    
    public class DF_DPS : DataFetcher {
    	protected override bool NextSerie(string contents, ref int lastIndexUsed) {
    		// irrelevant
    	}
    	
    	public override List<Serie> GetContents(string source) {
    		if(SourceCode == "") FetchSourceCode("http://www.dpstream.net/serie.html");
    		int workingIndex = SourceCode.IndexOf("rsm2");
    		while(NextSerie(SourceCode, ref workingIndex) == true);
    		
    		return Contents;
    	}
    	
    	public override List<KeyValuePair<Serie, int>> CheckNewEpisodesFollowed() {
    		return null;
    	}
    	
    	public DF_DPS() {
    		Identifier = "DPStream";
    	}
    }
    And here is the error message:
    System.Net.WebException: Unable to connect to the remote server ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
    at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
    at System.Net.ServicePoint.ConnectSocketInternal(Bool ean connectFailure, Socket s4, Socket s6, Socket& socket, IPAddress& address, ConnectSocketState state, IAsyncResult asyncResult, Int32 timeout, Exception& exception)
    --- End of inner exception stack trace ---
    at System.Net.HttpWebRequest.GetResponse()
    at DataFetcher.FetchSourceCode(String url) in c:\Users\win7\Documents\SharpDevelop Projects\net\net\Program.cs:line 55
    at DF_DPS.GetContents(String source) in c:\Users\win7\Documents\SharpDevelop Projects\net\net\Program.cs:line 212
    at WebFetch.Main(String[] args) in c:\Users\win7\Documents\SharpDevelop Projects\net\net\Program.cs:line 232
    My logger indicates that the url provided is fine and since I use the same method to connect (it is implemented through the base class) I pretty much don't know where to look for my mistake.

    Thanks.

    Edit: Also any general comment on my C# style is welcome. I literally just started out with the language specification document and a lot of googling.
    Last edited by Alexandre; 10-05-2011 at 01:26 PM.

  2. #2
    Registered User
    Join Date
    Mar 2009
    Location
    england
    Posts
    209
    Something strange happens with that web page. I tried accessing it last night with Internet Explorer. It worked fine. I then wrote a small program to access it:

    Code:
                Console.WriteLine("start");
                HttpWebRequest req = (HttpWebRequest)WebRequest.Create("http://www.dpstream.net/serie.html");
                WebResponse res = req.GetResponse();
                Stream stream = res.GetResponseStream();
                StreamReader reader = new StreamReader(stream);
                String html = reader.ReadToEnd();
                Console.WriteLine(html);
                Console.WriteLine("end");
    I got the same exception as yours, "unable to connect". I also noticed that if I then attempted to access the page again via Internet Explorer, it would fail to load there too.

    Then today I tried the same thing again. First load with Internet Explorer, worked fine. Then my program, which fails. Then Internet Explorer, it fails to load.

    Since the exact same conditions occured during both sets of tests, I am beginning to think that the web server I'm connecting to is temporarily banning me because my program's http request could be missing certain required headers.

    This is probably done in an effort to prevent spam bots. If my theory is correct, you could packet sniff Internet Explorer and see exactly what headers it includes in it's http requests and then try duplicating them in your program. If I'm wrong then ignore this post lol.

  3. #3
    Registered User
    Join Date
    Mar 2009
    Location
    england
    Posts
    209
    I think I was right. This works.

    Code:
                Console.WriteLine("start");
                HttpWebRequest req = (HttpWebRequest)WebRequest.Create("http://www.dpstream.net/serie.html");
                req.Host = "www.dpstream.net";
                req.UserAgent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2";
                req.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
                req.Headers.Add("Accept-Language", "en-gb,en;q=0.5");
                req.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7");
                WebResponse res = req.GetResponse();
                Stream stream = res.GetResponseStream();
                StreamReader reader = new StreamReader(stream);
                String html = reader.ReadToEnd();
                Console.WriteLine(html);
                Console.WriteLine("end");
    Last edited by theoobe; 10-06-2011 at 06:33 AM.

  4. #4
    Registered User
    Join Date
    Aug 2011
    Location
    Montreal, Quebec, Canada
    Posts
    73
    Hey thank you it worked.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. I am not able to connect to UDP port using following code.
    By rajenrshah in forum Networking/Device Communication
    Replies: 1
    Last Post: 06-01-2009, 04:50 AM
  2. problem in connect code
    By lolguy in forum C Programming
    Replies: 25
    Last Post: 02-23-2009, 04:41 PM
  3. Winsock HTML Page Source Dump...
    By blake_ in forum Networking/Device Communication
    Replies: 36
    Last Post: 11-19-2007, 11:02 PM
  4. using Excel to connect to external data source
    By George2 in forum Windows Programming
    Replies: 4
    Last Post: 05-01-2007, 10:07 PM
  5. requesting html source from a server
    By threahdead in forum Linux Programming
    Replies: 2
    Last Post: 08-01-2003, 07:52 PM