Spark快速入门系列(三)深入理解RDD

2022-09-17 13:29:56 浏览数 (1)

大家好,又见面了,我是你们的朋友全栈君。

这里写目录标题

  • 深入 RDD
    • 案例
    • 再谈 RDD
      • RDD 为什么会出现?
      • RDD 的特点
      • 什么叫做弹性分布式数据集
  • 总结: RDD 的五大属性

深入 RDD

目标

代码语言:javascript复制
深入理解 RDD 的内在逻辑, 以及 RDD 的内部属性(RDD 由什么组成)

案例

需求

代码语言:javascript复制
给定一个网站的访问记录, 俗称 Access log
计算其中出现的独立 IP, 以及其访问的次数

创建个数据文件access_log_sample.txt(数据量太大,存不到这里,先用100)

代码语言:javascript复制
190.217.63.59 - - [01/Nov/2017:00:00:15  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
76.114.21.96 - - [01/Nov/2017:00:00:31  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http://tricolor.entravision.com/sacramento/escucha-en-vivo/&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
206.126.121.204 - - [01/Nov/2017:00:00:46  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http://zone.msn.com/gameplayer/gameplayer.aspx?game=familyfeud&cat=internet-portal HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
154.121.8.18 - - [01/Nov/2017:00:01:01  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=https://www.google.dz/search&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko/20100101 Firefox/11.0"
190.238.37.217 - - [01/Nov/2017:00:01:17  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
147.147.163.182 - - [01/Nov/2017:00:01:31  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=https://s-usweb.dotomi.com/renderer/delPublishersCookies.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0"
200.78.93.132 - - [01/Nov/2017:00:01:45  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://www.facebook.com/login/device-based/regular/login/&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
24.200.173.170 - - [01/Nov/2017:00:01:59  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://securepubads.g.doubleclick.net/static/glade.js&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
189.252.185.4 - - [01/Nov/2017:00:02:15  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=https://www.google.cm/blank.html&cat=internet-portal HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; rv:34.0) Gecko/20100101 Firefox/34.0"
190.90.22.125 - - [01/Nov/2017:00:02:29  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http://www.raicesdeeuropa.com/grandes-obras-de-los-principales-escritores-nacidos-durante-el-siglo-xix/&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.64.62.158 - - [01/Nov/2017:00:02:45  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://bancaporinternet.interbank.com.pe/Warhol/redireccionaInicioLogueo&cat=financial-service HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
122.54.153.240 - - [01/Nov/2017:00:03:00  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
181.64.62.158 - - [01/Nov/2017:00:03:16  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://www.google.com.pe/&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.236.239.8 - - [01/Nov/2017:00:03:33  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://www.google.com.pe/search?rlz=1C2AOHY_esPE760PE760&source=hp&ei=Uw_5WeGVA4TjmAHO8aCgDw&q=fb&oq=fb&gs_l=psy-ab.3..0i131k1j0l4j0i131k1l2j0l3.1767.1916.0.2135.2.2.0.0.0.0.144.269.0j2.2.0....0...1.1.64.psy-ab..0.2.267....0.pWGbpZy6zwg&safe=high&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"
190.110.200.41 - - [01/Nov/2017:00:03:50  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://www.facebook.com/rsrc.php/v3i0KB4/ye/l/es_LA/G6VcGRK_54X.js&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
77.180.73.169 - - [01/Nov/2017:00:04:06  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://gomovies.co/&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
49.146.42.248 - - [01/Nov/2017:00:04:22  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://mm-a.akamaihd.net/160/sn/assets/common/3d/particle/ns2/texture/line_040.dxt?v=25960&cat=content-delivery-network HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36"
181.64.146.165 - - [01/Nov/2017:00:04:39  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://www.facebook.com/rsrc.php/yR/r/lvSDckxyoU5.ogg&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36"
201.240.33.214 - - [01/Nov/2017:00:04:55  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://es.savefrom.net/#url=http://youtube.com/watch?v=gr_3VrQC8qY&utm_source=youtube.com&utm_medium=short_domains&utm_campaign=ssyoutube.com&cat=software-download HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.236.56.58 - - [01/Nov/2017:00:05:10  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https://scontent.flim5-4.fna.fbcdn.net/v/t1.0-1/p32x32/22310580_351017335344058_8554274362948717253_n.jpg?oh=5da979568a22e425b79b7ba788dbc30a&oe=5A65BCC3&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.64.192.238 - - [01/Nov/2017:00:05:26  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://www.google.com.pe/?gws_rd=ssl&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
201.255.225.35 - - [01/Nov/2017:00:05:41  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://www.google.com.ar/search?q=886971865721&oq=886971865721&aqs=chrome..69i57.719j0j7&sourceid=chrome&ie=UTF-8&safe=high&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.211.197.246 - - [01/Nov/2017:00:05:56  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://www.netflix.com/logout?locale=es-EC&cat=media-streaming HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
187.3.230.121 - - [01/Nov/2017:00:06:11  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http://baixar.programanex.com.br/latest/setup_nex.exe&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
175.158.226.85 - - [01/Nov/2017:00:06:28  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://web.facebook.com/?_rdc=1&_rdr&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
24.151.60.116 - - [01/Nov/2017:00:06:43  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https://www.edhelper.com/edhelper_monthly.htm&cat=educational-institution HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.236.157.88 - - [01/Nov/2017:00:06:58  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http://auth.kaybo1.com/member/login.html?back_url=http://pb.kaybo1.com/event/evt20170301_event/event01.html&cat=game HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
75.73.28.212 - - [01/Nov/2017:00:07:13  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.43.170.133 - - [01/Nov/2017:00:07:29  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.121.250.170 - - [01/Nov/2017:00:07:45  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://r1---sn-5mncvap8p5-a2ce.googlevideo.com/generate_204&cat=media-streaming HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"
190.237.183.6 - - [01/Nov/2017:00:08:01  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http://musicaq.biz/song.php?id=Q2hpbml0byBEZWwgQW5kZSAtICAgIFByaW1pY2lhfGh0dHBzOi8vYXBpLnNvdW5kY2xvdWQuY29tL3RyYWNrcy83MjkxNjE2NS9zdHJlYW0%2FY2xpZW50X2lkPTBmOGZkYmJhYTIxYTliZDE4MjEwOTg2YTdkYzJkNzJj&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.67.2.102 - - [01/Nov/2017:00:08:17  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http://lcperu.edestinos.com.pe/check-in-online&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.121.218.21 - - [01/Nov/2017:00:08:33  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://www.4kdownload.com/buy/videodownloader?source=videodownloader&redirect-locale=es&ui_source=show-on-run-3&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
76.23.172.162 - - [01/Nov/2017:00:08:48  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https://open.spotify.com/&cat=music HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.64.101.27 - - [01/Nov/2017:00:09:04  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://www.facebook.com/&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
114.186.152.178 - - [01/Nov/2017:00:09:21  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.234.56.200 - - [01/Nov/2017:00:09:36  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://toroadvertisingmedia.com/cr?b=218558&p=7550&c=6608&h=0d7386ae207d128d276c8fc974f8f99b&l=CO&tz=-5.0&sh=768.0&sw=1360.0&ad.trans.id=wzj9mrkhearh&t=1509494794724&u=https%3A%2F%2Fwww.popcornvod.com%2Fwelcome.html%3Faff%3D4054%26theme%3D0922%26clickid%3DOCM2NjA4IzI0MyM3NTUwfDIxODU1OHxDT3wzfDF8fHx3emo5bXJraGVhcmh8fHw%26pub%3D1400%26sub_pub_id%3D&cat=unknown HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.233.78.10 - - [01/Nov/2017:00:09:52  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http://www.flvto.biz/es/downloads/mp3/yt_5S-Fjz5CR5s/&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.46.172.102 - - [01/Nov/2017:00:10:07  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http://jkanime.net/kimi-ni-todoke-2/5/&cat=entertainment-and-art HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
189.148.47.237 - - [01/Nov/2017:00:10:24  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https://securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2800.0 Iron Safari/537.36"
182.251.246.12 - - [01/Nov/2017:00:10:39  0000] "GET /webapi/getcategory?uri=yakusoku.cocoloni.jp&cat=society HTTP/1.1" 200 60 "-" "Apache-HttpClient/UNAVAILABLE (java 1.4)"
181.234.203.122 - - [01/Nov/2017:00:10:56  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://bonusbitcoin.co/faucet&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
103.4.190.242 - - [01/Nov/2017:00:11:10  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://s1-word-edit-15.cdn.office.net/we/s/1687297775_App_Scripts/2057/WordEditor.Wac.TellMeModel.js&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
72.182.173.74 - - [01/Nov/2017:00:11:23  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=http://store.steampowered.com/agecheck/app/744640/&cat=game HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.179.100.64 - - [01/Nov/2017:00:11:36  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://swx.cdn.skype.com/assets/v/0.0.300/audio/m4a/call-dialing.m4a&cat=internet-communication HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
74.71.124.140 - - [01/Nov/2017:00:11:50  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http://my.netzero.net/s/sp&cat=internet-communication HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
94.189.216.28 - - [01/Nov/2017:00:12:04  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http://www.nba.com/&cat=sport HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
187.131.9.222 - - [01/Nov/2017:00:12:17  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http://www.google.com.mx/?gfe_rd=cr&dcr=0&ei=WRH5WeezN-bo8AeEoo7oDw&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
179.7.171.84 - - [01/Nov/2017:00:12:33  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
200.106.89.161 - - [01/Nov/2017:00:12:49  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=https://www.pnp.gob.pe/admision_EESTP_PNP/prospecto_proceso_admision_ETSPNP_2017_II.pdf&cat=government HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0"
187.222.252.169 - - [01/Nov/2017:00:13:05  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://s1-word-view-15.cdn.office.net/wv/s/1687297775_resources/3082/progress16.gif&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
49.146.42.248 - - [01/Nov/2017:00:13:22  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://dquchx93qmjdu.cloudfront.net/s3/resources/sound/common/pickweapon_69eea0cef175a3faa11eca989f346a4c.mp3&cat=content-delivery-network HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36"
189.181.11.35 - - [01/Nov/2017:00:13:38  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http://adexc.net/network/?ref_prm=28401&clck=b0ajqvw8zzni&pub_sd=M82IMGZFR&ad_spv=549&cat=botnet HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
190.186.200.125 - - [01/Nov/2017:00:13:56  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=https://es-la.facebook.com/&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0"
190.171.208.228 - - [01/Nov/2017:00:14:11  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http://chiquitests.com/enchinan/&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
75.86.115.195 - - [01/Nov/2017:00:14:28  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https://securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (X11; CrOS x86_64 9765.85.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.123 Safari/537.36"
201.240.33.221 - - [01/Nov/2017:00:14:45  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://cf-media.sndcdn.com/OaJxdnP5Fsen.128.mp3?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiKjovL2NmLW1lZGlhLnNuZGNkbi5jb20vT2FKeGRuUDVGc2VuLjEyOC5tcDMiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE1MDk0OTU2Nzh9fX1dfQ__&Signature=SjqtGj2LWI9SCvgiIzNXs4M7P7eA-OCfi~~MwNzxFQ-Pft1DLkoDuUx1vnqf0JC0BGKRegqep0hiMxiJMUUBVLYzEtZq0jZFZKz90zO8lyfvOG38vwnbUj68Jcpb6PTTvwLK1lK9Oo8RA1DSQ-NmA1v1yj8N0DQBZmEF2RXRbmXxgh7kSledHq2OFfQ1Im-OLJyvFEH2Mq-4c3YruyvdxSPxBOkp81CL53ceEm9oAYNThc-7HXv5LPbqB~OrcjqXi0VihyE4MSoIou08~3sZBNTpq2fB4RhP8TnoNblAQtWsPMEj~hXTX9cJ3WrOvb9k67DV3HKf0RYfpiX-jFTfog__&Key-Pair-Id=APKAJAGZ7VMH2PFPW6UQ&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"
202.151.22.3 - - [01/Nov/2017:00:15:00  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_WebFilter&ver=0.19.6.9&url=http://www.fijitimes.com/story.aspx&cat=news-and-media HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0"
181.176.73.81 - - [01/Nov/2017:00:15:16  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://web.facebook.com/login.php?login_attempt=1&lwv=110&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.236.208.200 - - [01/Nov/2017:00:15:33  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://www.google.com.pe/search?safe=strict&hl=es&biw=1366&bih=662&tbm=isch&sa=1&ei=WxD5WebYNIj4wASqorPIDw&q=contribucion&oq=con&gs_l=psy-ab.1.1.0i67k1l5j0l5.447088.451222.0.455291.37.12.0.0.0.0.394.1792.2-4j2.7.0....0...1.1.64.psy-ab..31.5.1471.0..0i30k1.248.GHQlbsuDZcQ&safe=high&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.137.144.93 - - [01/Nov/2017:00:15:48  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://www.ar.avon.com/REPSuite/orderEntry.page?redirected=true&isSuccess=Y&cat=shopping HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
49.148.209.194 - - [01/Nov/2017:00:16:03  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://www.facebook.com/&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
73.213.34.16 - - [01/Nov/2017:00:16:18  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https://swx.cdn.skype.com/assets/v/0.0.300/audio/m4a/call-outgoing-p1.m4a&cat=internet-communication HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
201.240.247.104 - - [01/Nov/2017:00:16:33  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://www.google.com.pe/search?q=como prepara par ahacer una mascara de pantomima&rlz=1C1NHXL_esPE709PE709&oq=como prepara par ahacer una mascara de pantomima&aqs=chrome..69i57.19536j0j7&sourceid=chrome&ie=UTF-8&safe=high&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.130.189.170 - - [01/Nov/2017:00:16:49  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://es.123rf.com/imagenes-de-archivo/ombligo.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.204.104.89 - - [01/Nov/2017:00:17:04  0000] "GET /webapi/getcategory?uri=www.google.co.ve&cat=search-engine HTTP/1.1" 200 67 "-" "Apache-HttpClient/UNAVAILABLE (java 1.4)"
190.42.233.34 - - [01/Nov/2017:00:17:20  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://www.facebook.com/&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
187.136.98.155 - - [01/Nov/2017:00:17:35  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http://musicaq.biz/descargar-musica/9f352ef6-santana-the-game-of-love-ft-michelle-branch.html&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
147.147.163.182 - - [01/Nov/2017:00:17:50  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=https://www.worldtimebuddy.com/&cat=unknown HTTP/1.1" 200 133 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0"
190.232.70.238 - - [01/Nov/2017:00:18:06  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http://sv3.onlinevideoconverter.com/download?file=e4c2d3a0e4a0c2&cat=adult-and-pornography HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
49.148.209.194 - - [01/Nov/2017:00:18:21  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://web.roblox.com/&cat=game HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
189.170.192.60 - - [01/Nov/2017:00:18:36  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQcdjN8-1NJnSeC6ptIlx7S0wZucgg1jzL4N-i7IWE_8o8-F0gmjw&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"
161.18.215.235 - - [01/Nov/2017:00:18:50  0000] "GET /webapi/getcategory?uri=www.wattpad.com&cat=personal-site-and-blog HTTP/1.1" 200 75 "-" "Apache-HttpClient/UNAVAILABLE (java 1.4)"
138.36.222.166 - - [01/Nov/2017:00:19:04  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
201.230.112.110 - - [01/Nov/2017:00:19:20  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://civilgeeks.com/categor%C3%ADa/hidraulica/&cat=personal-site-and-blog HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.234.49.7 - - [01/Nov/2017:00:19:35  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://www.google.com.pe/search?q=PINTEREST&oq=PINTERE&aqs=chrome.0.69i59j69i60j69i65j69i57j0l2.2160j0j1&sourceid=chrome&ie=UTF-8&safe=high&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.236.239.11 - - [01/Nov/2017:00:19:49  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://scontent.flim5-3.fna.fbcdn.net/v/t1.0-1/p32x32/22687826_1976412995963948_3676302371441952941_n.jpg?oh=7bc40797d744c7b5d94dd368ae4de823&oe=5A6CCB3A&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
73.193.233.55 - - [01/Nov/2017:00:20:04  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://web2.secureinternetbank.com/pbi_pbi1151/login/Remote/221272028&cat=financial-service HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.66.152.36 - - [01/Nov/2017:00:20:19  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://es.answers.yahoo.com/question/index?qid=20120715103200AAX15LS&cat=internet-portal HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
24.12.190.248 - - [01/Nov/2017:00:20:35  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=http://mesgmy.ebay.com/ws/eBayISAPI.dll?ViewMyMessages&_trksid=p2057872.m2034.l3912&CurrentPage=MyeBayMyMessages&ssPageName=STRK:ME:LNLK:None&FClassic=true&cat=auctions HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.88.204.14 - - [01/Nov/2017:00:20:50  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http://www.espn.com.ve/futbol/resultados/_/liga/todo/fecha/20171030&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
121.208.9.139 - - [01/Nov/2017:00:21:06  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http://contest.cartoonnetwork.com.au/mobile/&cat=entertainment-and-art HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.237.181.93 - - [01/Nov/2017:00:21:21  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://www.google.com.pe/&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.226.68.247 - - [01/Nov/2017:00:21:35  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://articulo.mercadolibre.com.ar/MLA-666799963-ipod-classic-_JM&cat=auctions HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.237.218.215 - - [01/Nov/2017:00:21:51  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://www.google.com.pe/search?safe=strict&rlz=1C1NHXL_esPE700PE709&ei=lRP5WaLvKMS1wQS-vbIw&q=sword art online temporada 3 capitulo 1 sub espa%C3%B1ol&oq=sword art online temporada 3&gs_l=psy-ab.1.1.0i67k1l2j0l8.4652.4951.0.6388.2.2.0.0.0.0.395.650.2-1j1.2.0....0...1.1.64.psy-ab..0.2.635....0.mr5_VTgCxKQ&safe=high&cat=search-engine HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
187.202.159.92 - - [01/Nov/2017:00:22:06  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_AntiPorn&ver=0.19.6.9&url=http://www.excelsior.com.mx/europa#view-1&cat=news-and-media HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0"
49.145.255.136 - - [01/Nov/2017:00:22:23  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http://ff81k.voluumtrk2.com/8dc38b77-7604-481b-bd63-11eaca6207e4?ID=74575527&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
81.103.165.211 - - [01/Nov/2017:00:22:39  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https://securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
129.7.0.190 - - [01/Nov/2017:00:22:55  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://elearning.uh.edu/bbcswebdav/pid-4102743-dt-content-rid-27567989_1/xid-27567989_1&cat=sport HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.137.235.112 - - [01/Nov/2017:00:23:12  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://www1.tarjetacencosud.com.ar/sociosce/context/initPrivada.action&cat=unknown HTTP/1.1" 200 134 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
71.228.46.247 - - [01/Nov/2017:00:23:28  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://www.udacity.com/courses/data-science&cat=educational-institution HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
187.189.90.132 - - [01/Nov/2017:00:23:45  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
186.93.5.254 - - [01/Nov/2017:00:24:01  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://www.banesconline.com/MANTIS/WEBSITE/imagenesinhouse/imagenesinhouse.aspx&cat=financial-service HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
181.176.85.164 - - [01/Nov/2017:00:24:15  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://web.facebook.com/login.php?login_attempt=1&lwv=110&cat=social-networking HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
190.229.2.7 - - [01/Nov/2017:00:24:30  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_webfilter&ver=0.19.7.1&url=https://windows-file-explorer.softonic.com/?ex=DSK-309.5&cat=software-download HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"
101.102.214.204 - - [01/Nov/2017:00:24:44  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=firefox_WebFilter&ver=0.19.6.9&url=https://www.chatwork.com/#!rid37781593&cat=computer-information HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0"
107.130.125.138 - - [01/Nov/2017:00:25:00  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://securepubads.g.doubleclick.net/static/3p_cookie.html&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.75 Safari/537.36"
187.204.183.145 - - [01/Nov/2017:00:25:14  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=http://educacion.app.jalisco.gob.mx/cas/Default.aspx&cat=government HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
88.26.241.195 - - [01/Nov/2017:00:25:28  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrlLexibook?appid=android_safebrowser&ver=1.2.4&url=https://www-cdn.whatsapp.net/android/2.17.393/WhatsApp.apk&cat=internet-communication HTTP/1.1" 200 149 "-" "Dalvik/1.6.0 (Linux; U; Android 4.4.2; MFS100ES Build/KOT49H)"
190.218.173.239 - - [01/Nov/2017:00:25:43  0000] "GET /axis2/services/WebFilteringService/getCategoryByUrl?app=chrome_antiporn&ver=0.19.7.1&url=https://offer.alibaba.com/exclusive_US_EN.html?tv=2&isFeature=true&imp=5b1aor1btqfgc6v2rk7&xp=-baxEQ7WcvtuK1U3YXZj3e11KlWATqHSv3HPF5tfWmkCmo1TaYp8yWdHlHT3IkKE4blNtS6vAcINPyVmlLV4u-mPaUrlz_JCb14tWvEsxKI&pid=1018325&td=Propellerads&cv=1020192&aff_id=182463618&ct=2&size=300_250&cn=PA&an=50001&bm=cpa&tp1=372702377464&src=saf&cat=business-and-economy HTTP/1.1" 200 133 "-" "Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"

代码展示

代码语言:javascript复制
class AccessLogAgg { 
   
  @Test
  def ipAgg(): Unit = { 
   
    Logger.getLogger("org").setLevel(Level.ERROR)
    //TODO 创建SparkContext
    val conf = new SparkConf().setMaster("local[6]").setAppName("ip_agg")
    val sc = new SparkContext(conf)
    //TODO 读取文件,生成数据集
    val path = "dataset\access_log_sample.txt"
    val source: RDD[String] = sc.textFile(path)

    //TODO 取出IP 赋予出现次数为1
    val ipRDD: RDD[(String, Int)] = source.map(x => (x.split(" ")(0), 1))
    //TODO 简单清洗
      //去除空的数据
      //去掉非法的数据
      //根据业务再整理一下数据
      val cleanRDD: RDD[(String, Int)] = ipRDD.filter(x => StringUtils.isNotEmpty(x._1))

    //TODO 根据IP出现的次数进行聚合
    val ipAggRDD: RDD[(String, Int)] = cleanRDD.reduceByKey(_   _)
    //TODO 根据IP出现的次数进行排序 默认升序
    val sortRDD: RDD[(String, Int)] = ipAggRDD.sortBy(x => x._2, ascending = false)

    //TODO 取出结果打印结果
    sortRDD.foreach(println)
  }
}

针对这个小案例, 我们问出互相关联但是又方向不同的五个问题

1.假设要针对整个网站的历史数据进行处理, 量有 1T, 如何处理?

代码语言:javascript复制
放在集群中, 利用集群多台计算机来并行处理

2.如何放在集群中运行?

简单来讲, 并行计算就是同时使用多个计算资源解决一个问题, 有如下四个要点

  • 要解决的问题必须可以分解为多个可以并发计算的部分
  • 每个部分要可以在不同处理器上被同时执行
  • 需要一个共享内存的机制
  • 需要一个总体上的协作机制来进行调度

3.如果放在集群中的话, 可能要对整个计算任务进行分解, 如何分解?

概述

  • 对于 HDFS 中的文件, 是分为不同的 Block 的
  • 在进行计算的时候, 就可以按照 Block 来划分, 每一个 Block 对应一个不同的计算单元

扩展

  • RDD 并没有真实的存放数据, 数据是从 HDFS 中读取的, 在计算的过程中读取即可
  • RDD 至少是需要可以 分片 的, 因为HDFS中的文件就是分片的, RDD 分片的意义在于表示对源数据集每个分片的计算, RDD 可以分片也意味着 可以并行计算

4.移动数据不如移动计算是一个基础的优化, 如何做到?

每一个计算单元需要记录其存储单元的位置, 尽量调度过去

5.在集群中运行, 需要很多节点之间配合, 出错的概率也更高, 出错了怎么办?

RDD1 → RDD2 → RDD3 这个过程中, RDD2 出错了, 有两种办法可以解决

  • 缓存 RDD2 的数据, 直接恢复 RDD2, 类似 HDFS 的备份机制
  • 记录 RDD2 的依赖关系, 通过其父级的 RDD 来恢复 RDD2, 这种方式会少很多数据的交互和保存

如何通过父级 RDD 来恢复?

  • 记录 RDD2 的父亲是 RDD1
  • 记录 RDD2 的计算函数, 例如记录 RDD2 = RDD1.map(…​), map(…​) 就是计算函数
  • 当 RDD2 计算出错的时候, 可以通过父级 RDD 和计算函数来恢复 RDD2

6.假如任务特别复杂, 流程特别长, 有很多 RDD 之间有依赖关系, 如何优化?

上面提到了可以使用依赖关系来进行容错, 但是如果依赖关系特别长的时候, 这种方式其实也比较低效, 这个时候就应该使用另外一种方式, 也就是记录数据集的状态

在 Spark 中有两个手段可以做到

  • 缓存
  • Checkpoint

再谈 RDD

目标

  1. 理解 RDD 为什么会出现
  2. 理解 RDD 的主要特点
  3. 理解 RDD 的五大属性

RDD 为什么会出现?

在 RDD 出现之前, 当时 MapReduce 是比较主流的, 而 MapReduce 如何执行迭代计算的任务呢?

多个 MapReduce 任务之间没有基于内存的数据共享方式, 只能通过磁盘来进行共享

这种方式明显比较低效

RDD 如何解决迭代计算非常低效的问题呢?

在 Spark 中, 其实最终 Job3 从逻辑上的计算过程是: Job3 = (Job1.map).filter, 整个过程是共享内存的, 而不需要将中间结果存放在可靠的分布式文件系统中

这种方式可以在保证容错的前提下, 提供更多的灵活, 更快的执行速度.

RDD 的特点

RDD 不仅是数据集, 也是编程模型 RDD 即是一种数据结构, 同时也提供了上层 API, 同时 RDD 的 API 和 Scala 中对集合运算的 API 非常类似, 同样也都是各种算子

RDD 的算子大致分为两类:

  • Transformation 转换操作, 例如 map flatMap filter 等
  • Action 动作操作, 例如 reduce collect show 等

执行 RDD 的时候, 在执行到转换操作的时候, 并不会立刻执行, 直到遇见了 Action 操作, 才会触发真正的执行, 这个特点叫做 惰性求值

RDD 可以分区

RDD 是一个分布式计算框架, 所以, 一定是要能够进行分区计算的, 只有分区了, 才能利用集群的并行计算能力

同时, RDD 不需要始终被具体化, 也就是说: RDD 中可以没有数据, 只要有足够的信息知道自己是从谁计算得来的就可以, 这是一种非常高效的容错方式

RDD 是只读的

RDD 是只读的, 不允许任何形式的修改. 虽说不能因为 RDD 和 HDFS 是只读的, 就认为分布式存储系统必须设计为只读的. 但是设计为只读的, 会显著降低问题的复杂度, 因为 RDD 需要可以容错, 可以惰性求值, 可以移动计算, 所以很难支持修改.

  • RDD2 中可能没有数据, 只是保留了依赖关系和计算函数, 那修改啥?
  • 如果因为支持修改, 而必须保存数据的话, 怎么容错?
  • 如果允许修改, 如何定位要修改的那一行? RDD 的转换是粗粒度的, 也就是说, RDD 并不感知具体每一行在哪.

RDD 是可以容错的

RDD 的容错有两种方式

  • 保存 RDD 之间的依赖关系, 以及计算函数, 出现错误重新计算
  • 直接将 RDD 的数据存放在外部存储系统, 出现错误直接读取, Checkpoint

什么叫做弹性分布式数据集

分布式

  • RDD 支持分区, 可以运行在集群中

弹性

  • RDD 支持高效的容错
  • RDD 中的数据即可以缓存在内存中, 也可以缓存在磁盘中, 也可以缓存在外部存储中

数据集

  • RDD 可以不保存具体数据, 只保留创建自己的必备信息, 例如依赖和计算函数
  • RDD 也可以缓存起来, 相当于存储具体数据

总结: RDD 的五大属性

首先整理一下上面所提到的 RDD 所要实现的功能:

  • RDD 有分区
  • RDD 要可以通过依赖关系和计算函数进行容错
  • RDD 要针对数据本地性进行优化
  • RDD 支持 MapReduce 形式的计算, 所以要能够对数据进行 Shuffled

对于 RDD 来说, 其中应该有什么内容呢? 如果站在 RDD 设计者的角度上, 这个类中, 至少需要什么属性?

  • Partition List 分片列表, 记录 RDD 的分片, 可以在创建 RDD 的时候指定分区数目, 也可以通过算子来生成新的 RDD 从而改变分区数目
  • Compute Function 为了实现容错, 需要记录 RDD 之间转换所执行的计算函数
  • RDD Dependencies RDD 之间的依赖关系, 要在 RDD 中记录其上级 RDD 是谁, 从而实现容错和计算
  • Partitioner 为了执行 Shuffled 操作, 必须要有一个函数用来计算数据应该发往哪个分区
  • Preferred Location 优先位置, 为了实现数据本地性操作, 从而移动计算而不是移动存储, 需要记录每个 RDD 分区最好应该放置在什么位置

发布者:全栈程序员栈长,转载请注明出处:https://javaforall.cn/158913.html原文链接:https://javaforall.cn

0 人点赞