1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107
| using System; using System.Collections.Generic; using System.Diagnostics; using System.IO; using System.Linq; using System.Net; using System.Text; using System.Threading.Tasks; using System.IO.Compression; using Caty.Spider.Crawler.Events;
namespace Caty.Spider.Crawler { public class SimpleCrawler : ICrawler { public event EventHandler<OnStartEventArgs> OnStart;
public event EventHandler<OnCompletedEventArgs> OnCompleted;
public event EventHandler<OnErrorEventArgs> OnError;
public CookieContainer CookiesContainer { get; set; }
public SimpleCrawler() { }
public async Task<string> Start(Uri uri, string proxy = null) { return await Task.Run(() => { var pageSource = string.Empty; try { if (this.OnStart != null) this.OnStart(this, new OnStartEventArgs(uri)); var watch = new Stopwatch(); watch.Start(); var request = (HttpWebRequest)WebRequest.Create(uri); request.Accept = "*/*"; request.ContentType = "application/x-www-form-urlencoede"; request.AllowAutoRedirect = false; request.UserAgent = "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36"; request.Timeout = 5000; request.KeepAlive = true; request.Method = "GET"; if (proxy != null) request.Proxy = new WebProxy(proxy); request.CookieContainer = this.CookiesContainer; request.ServicePoint.ConnectionLimit = int.MaxValue;
using (var response = (HttpWebResponse)request.GetResponse()) {
foreach (Cookie cookie in response.Cookies) this.CookiesContainer.Add(cookie);
if (response.ContentEncoding.ToLower().Contains("gzip")) { using (GZipStream stream = new GZipStream(response.GetResponseStream(), CompressionMode.Decompress)) { using (StreamReader reader = new StreamReader(stream, Encoding.UTF8)) { pageSource = reader.ReadToEnd(); } } } else if (response.ContentEncoding.ToLower().Contains("deflate")) { using (DeflateStream stream = new DeflateStream(response.GetResponseStream(), CompressionMode.Decompress)) { using (StreamReader reader = new StreamReader(stream, Encoding.UTF8)) { pageSource = reader.ReadToEnd(); }
} } else { using (Stream stream = response.GetResponseStream()) { using (StreamReader reader = new StreamReader(stream, Encoding.UTF8)) {
pageSource = reader.ReadToEnd(); } } } } request.Abort(); watch.Stop(); var threadId = System.Threading.Thread.CurrentThread.ManagedThreadId; var milliseconds = watch.ElapsedMilliseconds; if (this.OnCompleted != null) this.OnCompleted(this, new OnCompletedEventArgs(uri, threadId, milliseconds, pageSource)); } catch (Exception ex) { if (this.OnError != null) this.OnError(this, new OnErrorEventArgs(uri, ex)); } return pageSource; }); } } }
|