Caller ID Superfecta WhoCalled UK returns 403 forbidden, however OK from a browser

Hi there, I wonder if anyone out there is able to help me fix this error, as I’d really like to use this fantastic UK caller id lookup source but I am not much of a programmer that’s why I need Your help ..

I believe this is because when Executing WhoCalled UK the curl command is not setting a valid user-agent header and is therefore, being blocked by the https://who-called.co.uk website.

I didn’t want to just ask for your help, I wanted to have a go this is what I did but unfortunately I still cannot get it to work.

All ideas welcome and thanks again.

> <?php
> /*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
>  * Developer Notes:
>  *        TOS for this site here: 
>  *        https://who-called.co.uk/Terms-of-Service
>  *        dated 6th November 2014, there is no language that prohibits automated lookups
>  *        The user is obligated to only use the service for personal use and obligated to add
>  *        businesess phone numbers
>  * 
>  *
>  * Version History
>  *        2017-10-21   Initial commit by lgaetz
>  *        2017-10-21   Added some basic spam logic
>  *        2017-10-23   change to only load url once, eliminate date retrieve
>  *        2018-04-15   bug fix, don't assume $average_rate has a value
>  *        2019-07-26   regex updates and additional logging
>  *        2020-09-26   COVID19 edition - fixed comment/harrassing logic
>  *
>  *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***/
> 
>  class WhoCalled_UK extends superfecta_base {
> 
> 	public $description = "https://who-called.co.uk - A datasource devoted to identifying telemarketers. These listings are provided by other users of this service. Review TOS at https://who-called.co.uk/Terms-of-Service";
> 	public $version_requirement = "2.11";
> 	public $source_param = array(
> 		'Comment_Number_Threshold' => array(
> 			'description' => 'Minimum number of comments required to trust the rating. Set to zero to disable and trust all ratings.',
> 			'type' => 'number',
> 			'default' => '3'
> 		),
> 		'Search_Number_Threshold' => array(
> 			'description' => 'Minimum number of searches required to trust the rating. Set to zero to disable and trust all ratings.',
> 			'type' => 'number',
> 			'default' => '0'
> 		),
> 	);
> 
> 	function get_caller_id($thenumber, $run_param=array()) {
> 
> 		// initialize variables,if user has not set anything set user params to their defaults
> 		if (!isset($run_param['Comment_Number_Threshold'])){
> 			$run_param['Comment_Number_Threshold'] = '3';
> 		}
> 		if (!isset($run_param['Search_Number_Threshold'])){
> 			$run_param['Search_Number_Threshold'] =  '0';
> 		}
> 		
> 		// load page for number
> 		$url = "https://who-called.co.uk/Number/$thenumber";    // working 2017-10-21
> 		$options = [ 'http' => [ 'user_agent' => '"Mozilla/5.0 (Linux; Android 10; SM-G996U Build/QP1A.190711.020; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Mobile Safari/537.36"',  ], ];
> 		$context = stream_context_create($options);
> 
> 
> 		$this->DebugPrint("Searching $url ... ");
> 		$value = $this->get_url_contents($url, false, $context);
> 		
> 
> 	
> 		// find average rate 
> 		$pattern = '~<div class="call-stats-item">\s+<span>User reputation:</span>\s+<span><b><span class="([A-Za-z]+(-[A-Za-z]+)+) ([A-Za-z]+(-[A-Za-z]+)+)"> (.+?) </span></b></span>\s+</div>~s';    // working 2019-07-26
> 		$matches = null;
> 		$foo=preg_match($pattern,$value,$matches);
> 		if (isset($matches[1])) {
> 			$average_rate = trim($matches[5]);
> 			$this->DebugPrint("Average Rate: ".$average_rate);
> 		}
> 		
> 		// find number of searches
> 		$pattern = '~<div class="call-stats-item">\s+<span>Number of searches:</span>\s+<span><b>(.+?)\s+</b></span>\s+</div>~s';   // working 2017-10-21
> 		$matches = null;
> 		$foo=preg_match($pattern,$value,$matches);
> 		if (isset($matches[1])) {
> 			$number_of_searches = trim($matches[1]);
> 		} else {
> 			$number_of_searches = 0;
> 		}
> 
> 		// find number of comments
> 		$pattern = '~<div class="call-stats-item">\s+<span>Number of comments:</span>\s+<span><b><a style="color: black" href="#user-reviews">(.+?) </a></b></span>\s+</div>~s';           // working 2019-07-26
> 		$matches = null;
> 		$foo=preg_match($pattern,$value,$matches);
> 		if (isset($matches[1])) {
> 			$number_of_comments = trim($matches[1]);
> 		} else {
> 			$number_of_comments = 0;
> 		}
> 
> 		$this->DebugPrint("Number of Searches: ".$number_of_searches);
> 		$this->DebugPrint("Search Threshhold : ".$run_param['Search_Number_Threshold']);
> 		$this->DebugPrint("Number of Comments: ".$number_of_comments);
> 		$this->DebugPrint("Comment Threshold : ".$run_param['Comment_Number_Threshold']);
> 
> 		// site should return a rate text string, dangerous, harassing, unknown, neutral, safe
> 		if (isset ($average_rate)) {
> 			switch (strtolower($average_rate)) {
> 				case "negative":
> 					if($number_of_searches < $run_param['Search_Number_Threshold'] || $number_of_comments < $run_param['Comment_Number_Threshold']) {
> 						$this->DebugPrint("Number flagged as Dangerous, but comment/search threshold not met");
> 					} else {
> 						$this->DebugPrint("Number flagged as Dangerous, comment/search threshold met, setting call as SPAM");
> 						$this->spam = true;
> 					}
> 					break;
> 				case "undetermined":
> 					$this->DebugPrint("Number flagged as Unknown, doing nothing");
> 					break;
> 				case "positive":
> 					$this->DebugPrint("Number flagged as Safe, doing nothing");
> 					break;
> 				default:
> 					$this->DebugPrint("Site returned unexpected rating of ".$average_rate.", doing nothing");
> 					break;
> 			}
> 		}
> 	}
> 
> }

You need to edit this and make it more readable using the code format option.

The function get_url_contents is defined in includes/superfecta_base.php:

function get_url_contents($url, $post_data=false, $referrer=false, $cookie_file=false, $useragent=false) 

It looks like you can pass the user agent as the 5th argument, so that would make the line in your file

$value = $this->get_url_contents($url, false, $context,false,"user agent here");

No idea if this works or not

Thanks for the suggestion but still that forbidden error when it tries to scrape the site.
I’m obviously doing something wrong, I’m stuck.

I can’t believe that this is acceptable use for the site, which is clearly aimed at, serving advertising to trackable individuasl. Looking at the terms of use, I’d say they were not written by a lawyer, rather than they intended to allow screen scraping, and sharing the results amongst the other users of your PABX.

It may well be that you are being blocked because your use pattern looks like a machine, not a human (for many sites now, that would happen in the Cloudflare system, before you even got to the site proper).

One of the main reasons for requiring an acceptable User Agent, is to try and detect and reject automated accesses, either attempting to bypass adverting, or to extract a local copy of their database.

LOAD OF TOS Mate yes we know why it returns 403 as i stated if you were going to chime in, you could of actually made an attempt to fix the script instead of pointing out the obvious ..

Sadly it looks like this isn’t going to get fixed any time soon or ever :cry: because:

  1. the site Who Called UK uses Cloudflare: Same Origin Policy.

I wonder if someone can make the script fetch the homepage first? that way it would comply with the SOP and return the correct results?

Unfortunately I cannot fix this myself.
I’d be grateful if someone else would like to try and fix this superfecta script.

Cheers,


Andy.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.