PHP登入网站抓取并且抓取数据

2023-02-20 14:51:37 浏览数 (1)

有时候需要登入网站,然后去抓取一些有用的信息,人工做的话,太累了。有的人可以很快的做到登入,但是需要在登入后再去访问其他页面始终都访问不了,因为他们没有带Cookie进去而被当做是两次会话。下面看看代码

<?php  //test.php function getWebContent($host,$page=”/”,$paramstr=””,$cookies=”,$medth=”POST”,$port=80){ $fp = fsockopen($host,$port); if(!$fp){ return false; } $medth = strtoupper($medth); $medth = $medth==”POST” ? “POST”:”GET”; $length = strlen($paramstr); if($medth == “GET” && $paramstr){ $page .= “?”.$paramstr; } $out = “$medth $page  HTTP/1.1 “; $out .= “Accept: */* “; $out .= “Host: www.exaple.com “; $out .= “Content-Length: “.$length.” “; $out .= “Content-Type: application/x-www-form-urlencoded “; if($cookies){ $out .= “Cookie: “.$cookies.” “; } $out .= “Connection: Keep-Alive “; if($medth==’POST’ && $paramstr){ $out .= $paramstr.” “; } fwrite($fp, $out); $cookie = “”; $content = “”; while (!feof($fp)) { $str = fgets($fp); if(preg_match(“/Set-Cookie:([^ ]*)/”,$str,$matchs)){ if($cookie){ $cookie .= “;”.$matchs[1]; }else{ $cookie = $matchs[1]; } } $content .= $str; echo $str; } fclose($fp); return array(‘content’=>$content,’cookie’=>$cookie); }

$params = “name=admin&pwd=admin”; $rs = getWebContent(“127.0.0.1″,”/test/login.php”,$params,””,”POST”,8080); echo $rs[‘content’]; $rs = getWebContent(“127.0.0.1″,”/test/index.php”,””,$rs[‘cookie’],”POST”,8080); //这里传入上次cookie是关键,否则会被当成两次会话 echo $rs[‘content’]; ?>

<?php //login.php $name = $_REQUEST[‘name’]; $pwd = $_REQUEST[‘pwd’]; if($name == “admin” && $pwd == “admin”){ setcookie(“cname”,$name); echo “success”; }else{ echo “failed”; } ?>

<?php //index.php if(isset($_COOKIE[‘cname’]) && $_COOKIE[‘cname’]){ echo “<ul><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li></ul>”; }else{ echo “please login first!”; } ?>

将上面三个文件分别保存,login.php和index.php放在root目录下的test目录下。然后test.php放在任意目录,然后去命令行运行php test.php,结果就能出来。

还有一种更简单的方式,就是用curl,代码如下,可以用下面的代码替换test.php <?php $post_data = array ( “name” => “admin”, “pwd” => “admin”, ); $cookie_jar = tempnam(‘./’, ‘cookie’);//新建cookie文件 $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, “http://localhost:8080/test/login.php”); //设定返回的数据是否自动显示 curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // 我们在POST数据哦! curl_setopt($ch, CURLOPT_POST, 1); // 把post的变量加上 curl_setopt($ch, CURLOPT_POSTFIELDS, $post_data); //把返回来的cookie信息保存在$cookie_jar文件中 curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_jar); echo curl_exec($ch); curl_close($ch);

$ch2 = curl_init(); curl_setopt($ch2, CURLOPT_URL, “http://localhost:8080/test/index.php”); curl_setopt($ch2, CURLOPT_HEADER, false); curl_setopt($ch2, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch2, CURLOPT_COOKIEFILE, $cookie_jar); echo curl_exec($ch2); unlink($cookie_jar); curl_close($ch2); ?>

0 人点赞