Ben, sorry for not responding sooner, but here is my thought. Let’s suppose you could do a data dip in real time. For example, the incoming caller ID is something like 8005552368 (which could also be in the format 18005552368, or even +18005552368). It would then be sent to you as part of a HTTP request such as this:
http://www.everycall.us/query?8005552368 (<— This is just a sample, any format that would work is fine as long as the number could be sent as part of the HTTP request).
The “page” returned should probably be one very simple line of plain text (or maybe two, see below). It should contain a score OR some other character(s) to indicate that the number is not in one of the allowable formats (meaning it might be an international call). However, if the number is not in a usual format but has still been associated with “spam” calls then a score should be returned anyway, if you have that information in your database.
The biggest issue would be that the server would have to be reasonably fast, so that call processing is not delayed significantly. You may also want to somehow keep records of the number of data dips per month sorted via IP address so that if anyone is abusing the service, you can block their IP address and/or request they make some contribution toward your server expenses. If all you are returning is a single line of text containing a one or two digit score then this may never become an issue, but still it seems that whenever anyone offers a free service there is always someone else who will figure out a way to abuse it. And then there’s also the issue that someone’s bad code might get stuck in a loop and continually make the same request every second or two for hours on end.
Anyway, the single line of text should be reasonably easy to handle in an AGI perl script. The biggest issue in FreePBX would be figuring out how to intercept the incoming call and then direct it based on the result of the data dip. I can think of at least one way to do it - change the context in a trunk definition from “from-trunk” to a new context in extensions_custom.conf, then in that custom context call the AGI script and based on what that script returns, either pass the call along to the normal “from-trunk” context or send it elsewhere (maybe a congestion tone). This is not something I could whip out in ten minutes but on the other hand it’s probably fairly trivial to code (especially for anyone who regularly writes code in perl or php). The biggest concern would be making sure that in the AGI script there is a short timeout on the HTTP request, so that if your site is cannot be accessed for some reason the call flow is not delayed by more than a second or two.
If your service were to become really popular than maybe the developers would consider adding a field on the inbound route configuration pages that would optionally do the data dip and then send the call to a “black hole” (or some other alternate) destination if the score were above a specified level.
Of course, in my ideal world your service would return two lines of plain-text information: the score as discussed above on the first line, and a best guess at the Caller ID name on the second line. That way, if the Caller ID name field on the incoming call were already populated we could ignore the second line, otherwise we could actually use it to provide some information about the call. The bonus is that for known spam callers you might actually associate a company name (or some other descriptive information about the type of call, such as “WARRANTY SCAM”, although you’d have to be very careful about doing something like that so as not to get sued for misidentifying a legitimate company), so that the name or information could be made to appear in the system log even if the original incoming Caller ID name were blank. I’m obviously not saying you should go paying the phone companies to do a real Caller ID data dip, but for numbers you know nothing about you could always return the state or province associated with the area code, like some cell phone companies do (e.g. NEW YORK CALL). But, that second line might add complexity and would also increase the load on your servers, so it’s entirely up to you if you’d want to do something like that.