>> TANCRED: …the kind of data experiment we
did, looking at a lot of our data in relation to our gambling customers right around the
time of the World Cup. Basically to see what we could see. I run Product Management at
Quova. Tobias, I’ll let you introduce yourself.>>SPECKBACHER: All right. So, I’m Tobias
Speckbacher. I’m the VP of Emerging Technologies at Quova, which really means that I get to
work with lots of different companies these days that are pre-products or first products
line, to see how they fit into our infrastructure to make things work there. I’ve been with
Quova for 10 years. I’ve had multiple roles there so I can run it through all technical
positions pretty much that we have there from research to operations. And recently, I moved
into this role. And that’s about it.>>TANCRED: Yeah. So, a little bit about Quova.
Quova provides information about IP addresses and we provide geographic and network information.
And what our customers do with that information is basically provide richer, more engaging,
more relevant experiences for their users. So, whether it’s geo-targeting and other kind
of targeting for search and other kind of advertising or financial services in e-commerce
companies helping to mitigate the risk of fraud. You have video on-demand in sports
companies who stream live video content and other rich content. One of the reasons they’re
able to do that with copyrighted content is because solutions like ours allow them to
comply with the regulations and the other contracts they have that restrict them from
streaming content in other places. Major League Baseball is an example in the U.S. where legislation
actually prevents them from streaming live games in markets where they’ve sold the rights
to broadcasters. So the reason they can stream live games is because they can tell where
you are if you’re in a home market and restrict that content. And that’s how game–gaming
customers uses as well. Gambling obviously has dif–online gambling has different restrictions
in different places around the world. The reason you can gamble online where it is legal
is because online gambling companies can tell whether you’re not–whether you’re in somewhere
where it’s legal or not. Yes, so, as I said we took–we have a number of gambling customers
mostly in the U.K., but all over the world. And we took some of their data and looked
at it in relation to right around the time of the World Cup as I said. So, a little bit
about–Tobias will talk a little bit about the methodology in our data.
>>SPECKBACHER: Okay. So, the way we get the data is we have what we call a closed feedback
system that we basically as customers use our data, we get individual transaction data
back from them, which we use for accounting purposes, but also to focus our research efforts.
So, if you have the IP–the Internet, the IPv4 space, basically it’s 4.2 billion addresses.
Not all of those are assigned so there’s about 2.8 billion addresses that are assigned right
now. But again, not all of those are used by actual users a lot of that is infrastructure
space. And the majority of that traffic comes from a subset of that. So, we use that feedback
data as a significant sample to target the areas that are important for our customers
to focus our research on. The other thing that we do is as we get that data back, we
release data–we release our IP data every week. As we get that data back, we join individual
IP addresses back onto all the dimensions that we have available on that specific IP
address or the network at large and we store that. So, we can basically perform dimensional
analysis across all the feedback data that we receive. And that’s about 30 billion queries
per month. That again is a subset of the queries that are actually performed against our data.
The actual number is probably, you know, way north of a 100 billion a month, because some
customers have higher performance requirements. And choose to implement it differently that
it doesn’t allow them to give feedback data back to us. What else is there?
>>TANCRED: So, some of the–some of the information that we–that we assign to an IP address includes
geographic information from continent down to postal code. And then the network characteristics
we assign are things like the carrier or ISP, the organization that is responsible for the
content of the network, the domain, the speed of the connection, how the connection is routed
through the Internet whether the–that IP address is associated with an automizing activity
and things like that. And we have this data going back pretty much since we started, about
10 years worth of data. You can imagine there’s a lot of data there. Because we have so much
data–well, one of the reasons we haven’t looked at it yet in a way that we’ve started
to look at it is because dealing with all these data is kind of onerous, there’s a lot
of data to deal with. And so, we’ll talk a little bit about the technologies that we
used to actually aggrate some of the data, so it’s easier to record against and also
just–just mine it. A little bit about gambling though, so online gambling has a kind of a
storied history. I mentioned the reason you can get them online is because these companies
can now tell whether you’re somewhere where it’s legal. Back in 2006, you saw stories
especially about European companies and executives of European companies being indicted in the
U.S. because they were breaking the U.S. laws by allowing their customers to gamble by violating
U.S. citizens from the U.S. to gamble. So, being able to tell where users are coming
from is critical to industries like gambling. And gambling in general, online gambling is
a growth market, so it’s–it represents 8% of the total market or did last year, which
is–which is significant in terms of the market. And it’s also growing. So it’s growing 13%
per year is projected to about $36 billion by 2012, which is a large market. And this
is all according to H2, which is a sort of industry–a gambling industry analyst. And
because of the legality, it’s mainly in Europe and Asia that you see online gambling. That’s
not to say there isn’t any gambling in North America though and in the U.S. In the U.S.,
gambling is traditionally legislated by states, different states have different laws. You
can actually gamble online. You can do things like you can bet on horse races in certain
states. And you can play Poker online for money in some cases. But what’s happening
in the U.S., there’s legislation now being passed to allow certain kinds of gambling
across the U.S. It will still be regulated by states. And one of the reasons that’s happening
is because there’s other laws being passed to allow that gambling to be taxed. And of
course, once you’ve, you know, it is a significant market. Once you start taxing it, of course,
it represents a significant revenue stream for the government. So it–that’s one reason
why you’re going to see that. And what you’re seeing now is some of those companies, those
same companies that were in trouble in 2006 are coming to the U.S. and they’re either
setting up shop or buying some of the existing gambling organizations in the U.S. So–and
gambling is interesting because it has–because it’s worldwide it has a lot of the aspects
that make IP address geolocation interesting. You need to localize the language to the–your
customer. You need to know where your customers are coming from so you can market to them.
And you need to–you need to restrict the access. And there’s also a lot of fraud involved,
especially during these big events, what you see is online gambling houses, especially
smaller ones will be blackmailed by fraudsters who’ll say, you know, “I’ve set up a system
that can take down your gambling site and I’m going to do that, you know, during the
World Cup unless you, you know, unless you pay me X amount of money.” And so it’s really
important to them, you know, that you–some of these sites have been destroyed because
they’ve ignored these threats. Some of them just pay out, but it’s really important for
them to be able to understand what threats are real and also help prevent those. So,
it has a nice broad application for IP Geo. So, a little bit about how we went about this.
We worked with a design company called Stamen Design. And they do a lot with really interesting
visualizations and they do a lot with geography. They did the maps for the last Olympics I
think they’re doing the 2012 Olympics in London as well. You can it see it–some of the projects
they’ve done here, Crimespotting in Oakland and San Francisco. It’s a project where you
can go and see real-time crime statistics for those cities. They’re responsible for
[INDISTINCT] labs where you can see different visualizations of big stories, log-on or logging-in
and wireless visualization. But they’re a fantastic design company, they do great work.
And we knew that working with them we would see–we would see the data in ways that we
hadn’t imagined we could see the data and see things that–that we wouldn’t see otherwise.
One of the things–one of the ways that they were able to work with a large dataset is
through the use of Solr, which you can talk a little bit about.
>>SPECKBACHER: So, Solr is a Apache project and it’s built on top of the Lucene Engine
that was developed by CNET in 2004. When it was developed by CNET, it was donated to the
Apache Foundation in 2004. What makes Solr interesting for a project like this is that
it allows you to rapidly dive into the data. It’s very fast to ingest data, so it’ll access
over it and it provides facet search and date faceting. So, faceting basically is–as it
correlated to group by operation that you can run some [INDISTINCT]. So, we’ve used
that to explore the data with Stamen. And we’ll present some interesting visualizations
that we both have at Solr and we used some innovative newer graphing concepts for those
visualizations.>>TANCRED: So, there are two kinds of graphs
that Stamen used with the data. The first is a Horizon Graph which I’ll talk about in
a little more detail, and the second is a stream graph, which may–you might be a little
bit more familiar with. I’ll talk about horizon graphs first. Horizon graphs were introduced
in 2008 in a paper by these folks. Stephen Few is a design blogger and consultant. He
is–his site is the perceptualedge. And he wrote a paper talking specifically about Panopticons
use, which is a commercial business intelligence company of Horizon Graphs. A lot of the images
you see are from Stephen’s paper. But it’s a really interesting way to see data that
you would normally look at–temporal data you might normally look at in a line graph
in a compressed form where you can start comparing things and seeing things differently. So,
you have a traditional line graph and this is a very good way to look at data over time
and you can see variations in data, peaks and valleys. It’s pretty intuitive what these
means. But it’s hard to compare one line graph to another. You can start overlaying line
graphs, you can start putting them beside each other, but it gets very busy, very quickly.
And you can see that this is–this is 50 stocks over about a year in 2006 all with different
line graphs. And it’s impossible to really see what’s going on with these line graphs,
to really compare what’s going on with them. So, Horizon Graphs alidade to see the same
data but in a much compressed form. And the way you do that is you draw a zero line in
the graph; ideally, somewhere in the middle of the graph depending on your graph and you
color the space between the zero line and the line. You color the space above the line
in one color, the space below the line in another. And what you have is anywhere if
you look at the red spaces, anywhere above the line, you have empty white space. And
so, you could leverage that white space by essentially flipping the graph up. So, now
you’ve cut the graph in half. You can still see the peaks and valleys through the color
and you can compress it further. So, this graph is–it has six bands of color and you
can see the darker color on top. If you look at those parts of darker color, those polygons
fit in the polygon below in every case. And what you can do is basically compress them
down. So, what you wound up with is a graph that takes up less than a fifth of space but
it still gives you a very good sense of the data. So, you can see by the intensity of
the color and the color itself whether the data is positive or negative and how–where
the peaks and valleys are. So obviously, where the colors’ more intense the peaks and valleys
are higher and lower. So, if you could look at that same graph of 50 stocks with Horizon
Graphs, you get a much richer picture of the data. You can see individually how individual
stocks have performed to which one have–which ones have done well and which ones haven’t
and you can start to see trends temporarily. So, you can see these stocks are all performing
negatively in this timeframe and these are performing positively. And that maybe gives
you some indication of where you might want to look deeper into the data. The other good
thing about this is this–all these line graphs are–it doesn’t really matter–it’s all relative.
So, you’re seeing relative peaks and valleys instead of absolute numbers. So that you can
see, you know, you might have one stock at–that trades at a very low price and other stock
that trades at a very high price, but you’ll see the same trends because the data is all
relative. So that’s Horizon Graphs. So, if you’ll look–so, you know, we’re dealing with
countries all around the world. These are the line graphs of the countries. You can
start to see–well, first of all, you can’t see many countries on one page. You can start
to see maybe some trends in terms of where the peaks and valleys are, but it’s hard to
kind of see them. So, this is actually a single color Horizon Graph, but you–this is Internet
traffic to gambling sites from different countries around the world. And immediately you start
to see–and this is–just in about a week before the World Cup. Immediately, you start
to see, like if you look at the right edge of each of these columns, you see a lot of
activity there, which correlates with, you know, the day before the day of the World
Cup. And you still see individually where you have a lot of activity. Like in Germany,
there’s always a lot of activity versus Guinea, where there’s not a lot of activity until
the World Cup. So–and you have many more countries here on this graph than you did
before. So, it’s a really powerful way to see data, temporal data, when you’re looking
at lots of elements. So, this was really neat. And it does show some trends. It really gets
interesting when we start looking at the stream graphs though, so I’ll let Tobias talk about
the stream graphs.>>SPECKBACHER: All right. So, stream graphs
are a type of Stacked Graph, complex layer graph. And it was developed by Lee Byron and
he developed it out of a personal interest to visualize his listening habits on lots
of events–last of that [INDISTINCT], lots of different data about which music you listen
to, how often you do that. So, he tried to do that with line graphs and different standard
visualization techniques and none of these really brought a clear picture to the table.
So, he developed the stream graph concept, which excels really when you’re trying to
present lots of data to a mass audience. It’s not–it’s probably not–I mean, it’s not a
accurate–it’s not a highly-statistical representation of the data, but it gives you ideas of trends
and how the different layers behave independently. In 2008, the New York Times published a stream
graph that showed block the movie ticket sales performance of 7,500 movies over the past
21 years. And, so this was kind of the first publication of stream graph that was very
popular. And it evoked different kinds of emotions. So, probably more technical people
didn’t feel that good about it because it doesn’t really give you a good quantitative
image of what’s going on. And less technical people really like the representation because
it is very aesthetic and it lets you visually explore the data much, much better than a
more accurate representation of the absolute numbers. So, here’s an example.
>>TANCRED: We’ll help you.>>SPECKBACHER: We’ll get them.
>>TANCRED: Basically–and this is actually what we’ll walk through. What the stream graphs
do is they let you start seeing trends and then depending on your system, you can start
drilling down into the data either with more stream graphs, which is what we’ll do or other
data. So, this graph is worldwide Internet traffic to some of our gambling customers
from the fifth through the 13th. And, of course, the World Cup started on the–of June of this
year, started on the 12th. So, what you see is a pretty regular pattern of Internet traffic.
It’s heavily dominated by European countries and the U.K., mostly because a lot of our
gambling customers are in the UK. But also they have a pretty good gambling culture there,
online gambling culture anyway. And you see there’s a lot of activity during the day.
It drops off at night, comes back during the day. You see activity on Saturday and then
more activity than the other days of the week, but it’s pretty regular until the day before
the World Cup where you see it spike and then continued to stay high. So, this is interesting.
It is dominated by the U.K. and Europe. So, what we’re going to do is drill down into
different continents and different countries and then eventually different network characteristics
of the data to see other trends. And you can see little examples of little anomalies in
here, but once you start drilling down they become a little bit more apparent. So, if
we look at just Europe, it pretty much looks the same. You start to see little weird things,
like up here you see this little chokepoint but it pretty much looks the same. So, let’s
take a look at everything but the U.K., since it was so heavily weighted from–with the
U.K. So, now, it starts to look a little bit different. You start to see less of a–the
rhythm is still there, but it’s less extreme. So, you see more activity throughout the day.
You also, on the first graph, you could see this little blip, but this becomes a lot more
apparent here. Friday morning, there’s something going on. And you see that’s this red band
in the middle, which is associated with the U.S. So, there’s something going on there.
But you also see different countries behaving differently. So, the blue up here, right above,
is the Netherlands and they have a very regular rhythm of activity during the day and not
much at night versus some place like Denmark, which is down here, which has pretty regular
activity throughout the day. And then you also have like this green up here is Singapore,
where there’s not a lot of activity at all in the week before the World Cup and then
it really just blows up. So, if we look at Asia–yeah?
>>I’m sorry, but what’s technically the buildup with Vietnam? I don’t understand [INDISTINCT]
>>TANCRED: That’s a good question, I’m glad you asked. Because it’s very important to
understand it. This–so the size, it’s like a Stacked Graphs, so the size of the color
is more traffic, more queries. And what this data represents is IP address queries from
these companies. It doesn’t necessarily mean that people are gambling, so someone could
be coming from the U.S. and hit the site and be denied.
So my question basically is, what is zero and why is it different from a graph that
is less [INDISTINCT] stacked graph?>>SPECKBACHER: So typically when you stack
graphs, you have a couple of issues. So first of all if you use lots of time series, a series
that don’t contribute that much data kind of disappear in the graph visually. So, the
other issue is, if you have two series of equal vertical height but with different slopping,
one of the two tends to disappear visually. So, this methodology really is to visually
pull those out and not make them disappear and stand apart. So it’s not so much like
I need to know exactly the slope and I want to know what the movement of the individual
>>SPECKBACHER: Right.>>Like how did you choose [INDISTINCT]
>>SPECKBACHER: It’s actually an algorithm that you…
>>Oh.>>SPECKBACHER: Yes. So, yeah, so it’s a detailed–there’s
detailed documentation in the paper that was linked on the previous slide, so.
>>TANCRED: Yeah. And you’ll see–you’ll see kind of how it differs from a stacked area
graph when we look at the U.K. specifically. And it’s a nice example of how a stream graph
kind of changes, how it’s different from a stacked area graph in some ways. Does that
help at all? I mean basically, what you’re seeing here–what you’re looking for are trends
and in some cases it gives you some answers, but in more cases it just raises additional
questions that you may or may or may not be able to answer with a stream graph. So we’re
looking at Asia. So Asia looks a little bit similar to Europe, except that you don’t have
that big spike on Saturday, because it’s–because for the customers that we’re seeing in this
traffic, Asia isn’t as much of a gambling culture traditionally, but you do see them
coming to these gambling sites during the World Cup, before and during the World Cup.
So, and again, you get a much better view here of the impact of Singapore and their
big traffic, which is represented in the middle here where it just kind of explodes. So this
gives you an idea of gambling patterns in Asia. If we look at the US, where you saw
that kind of weird spike, well this is North America, but this is instead of by country,
we did it by organization because it actually gets very interesting. So if you look at the–so
the immediate thing that you might notice here is that regular rhythm is gone. It’s
a pretty straight graph for the most part. You have these blips which I’ll talk about
in a second, but even in the bands, there isn’t a regular pulse of activity. So when
you look at the actual organizations, it’s hard for you to read that, but red is Google,
so either your counterparts in Mountain View are staying up all night gambling everyday
or there’s something else going on. You start looking at the other organizations like Microsoft
and Yahoo! and you realize what these are [INDISTINCT], that are indexing the site.
So that all of a sudden makes sense, where before, you might have seen a lot of traffic
from North America to the States and not being able to explain it because really you go to
a site once you get denied and that you don’t try again. This is much more understandable.
These kind of anomalies are weird. This one on this side was Comcast Cable in Centerville,
California. And so there was just a bunch of activity on Saturday. I don’t know why.
I don’t know–I mean, we can look at it further and we can say, “Okay, which sites were they
going to? What IP addresses were they? Does–is it many IP addresses or single IP addresses?”
But it’s something to look into. It can be completely legitimate or it could be illegitimate.
It could be someone probing the site before an attack. It could be someone probing the
site for legitimate reasons. It could the site itself doing some–running some tests.
You see the same thing here. This one’s in Phoenix from a publishing company. Again,
very odd to see that level of traffic the day before the World Cup, but it could be,
again, legitimate or illegitimate. And certainly it’s strange. You also see that chokepoint
that I mentioned earlier, much more pronounced here. And you see that on other graphs that
could be an attack, maybe the servers went down because of an attack or maybe they went
down because they crashed or maybe they’ve–maybe some of these sites took their service down
for maintenance. It happens to be during–I mean, it’s a bad maintenance window and that
is in the middle of the World Cup but if something bad was happening and they had to take the
site down, then it makes sense probably to do it when traffic was low anyway. So that’s
probably what it is. But it’s interesting looking at these graphs and kind of coming
up with theories for this. And then as a customer of the data, you would be looking at this.
As an industry, it [INDISTINCT] about what’s happening on the industry. Yeah?
>>[INDISTINCT]>>TANCRED: If you can–it’s not just relative,
you can get information about how many total queries is this and then you can start figuring
out what the traffic numbers actually are. What I would do if I actually wanted to know
what those numbers are, I’d query the data directly for that timeframe and find out what
the group’s in. I don’t for this category graph, but we could come up with them. Yeah?
>>[INDISTINCT]>>TANCRED: So that’s–that’s the way that
the graph works. It tries to–and maybe you can explain it better, Tobias, but it tries
to kind of equalize the data. And you’ll see this in some other graphs where there’s less
data, that the graph shifts more. Where there’s more data, it’s better at equalizing.
>>[INDISTINCT]>>TANCRED: Yeah. Right, right. And I don’t
know exactly what the graphing software’s doing there but it’s basically an artifact
to the graph.>>SPECKBACHER: This one?
>>TANCRED: Yeah. So this is everything but Europe, Asia and North America. So, again,
you see this kind of shift because there’s less data overall so the waiting is less.
But you start to see interesting things again, like which countries outside of those three
main markets are good markets for gambling and gaming. And so, here you have South America
in green and in gray, we got two grays, oh, Australia. And you see, again, South America
has a good rhythm. Australia, they stay up later or they’re gambling at different times,
but it’s more of an equal band until you get to Wednesday. Interestingly, Australia started
betting really early. If you look at other countries, I was looking at other countries
like Malawi. And when I was looking at Malawi, I was just looking between Friday and Friday,
and it was just basically flat except for a spike somewhere on Monday or Tuesday. And
I thought, well, like, “I guess they didn’t have a team in the World Cup so they weren’t
interested in it,” until I looked at–because every other country started betting on Friday,
and then I looked at Saturday and then there was a huge spike. So it’s just interesting
to see the different mentality of different countries. And I don’t think Nigeria played
until the 13th so it could be that they were betting on African teams. I don’t know. But
it’s interesting to come up with hypothesis about this. So now, we’ll look at three different
countries in Europe, starting with the UK because it represented so much data. This
is just a very interesting stream graph because it–you basically have a stream graph, and
if you take away London, you have a stacked area graph on top of it, because London basically
creates the zero line. But this essentially matches the European data in terms of its
pulse and again everything I talked about with betting on the weekend and the chokepoint
and things like that. So if we take away London, it’d be interesting to see if the U.K. is
sort of heterogeneous in the way it gambles and the graph essentially looks the same.
You start to see a little bit more detail in terms of what other cities in the U.K.
are gambling online but it basically looks the same. So, let’s look at something that
looks different. So here’s Germany. This kind of have this rhythm but it’s also a little
bit all over the place. You have, you know, Monday morning people come into work and they
stop betting, but then they sort of get over their guilt and they go online and continue
betting. Germany’s first game is on the 12th, and so, you see a big spike here. But it’s
pretty consistent; they’re online all the time betting, unlike the U.K. And you also
have this huge area that kind of looks like London did in the U.K. except this is Karlsruhe
which is not any place I’ve heard of. So, it’s a little bit harder to explain until
you start looking a little bit deeper into the data. And this is actually 1&1 Internet
AG. They’re an Internet provider. They have a big hosting facility in Karlsruhe. And so,
you know, we’re locating their traffic where their datacenter is because that’s the last
point we see. And so, in our data, this would be represented with the routing type of regional
proxy so, you know, we know what country it’s in, but we can’t necessarily tell you what
city it’s in. But at least we can tell you it’s Germany. And so, now that makes a little
bit more sense. So that’s Germany. We’ll look at Denmark next, which also looks really crazy.
There’s really no pattern here. You have this huge red and this huge blue. Definitely, you
see a lot of activity during the World Cup. And so, that big red most likely represents
consumer traffic. It’s strange that this blue is really active here and really active in
the middle of the week before the World Cup. And then, kind of dies out completely. When
you look at the organization behind this, that blue is basically a website that reports
odds for games and it refers traffic to the gambling houses. So, for whatever reason,
there’s a lot of people online checking the odds of different matches, whether it’s World
Cup or not and going to betting sites and placing bets. The red is similar to what you
saw in Germany in Karlsruhe, it seems to be a hosting provider, although, it also has–provides
VPN services. I don’t know why there’s a big spike there. Maybe there were some other major
sporting event that people were betting on. But certainly, if, you know, if I want to
learn more about the Denmark marketing, how it works, this is something that would, you
know, I would start looking into, why there might be a big spike and then a complete drop
in activity and what’s going on. So I mentioned that we have this geographic data, we also
looked at the data in terms of the never characteristics. In the next few graphs, Tobias will cover
and they show how people are connecting and routing to get to these gambling sites.
>>SPECKBACHER: Right. So what we see here is a stream graph representing the connection
types. Meaning, what we do is we categorize network blogs by how they are connected to
the Internet. So you have the DSL and cable down here in red and yellow which are, you
know, you would expect those to be dominating. There’s a pretty healthy amount of routing
as betting going around here that’s represented as purple on this graph. And we have this
green band that shows this uniform traffic coming through here on fix connections so
that again is probably most likely the U.S. traffic that we saw earlier that originated
from the large search providers and we can see that as fix connections here.
>>[INDISTINCT]>>TANCRED: Yes, you want to…
>>SPECKBACHER: Yes. All right. So, as I said there was a pretty healthy amount of mobile
betting going on. And that’s–and now we’re segmenting the data by mobile providers. And
since most of the traffic came from the U.K., we see T-Mobile U.K. and Hutchison 3G, I think
the dominant providers here. But this is kind of an interesting if you, you know, to slice
data like that it’s interesting to understand which providers users are with, you can use
that for marketing or target ads. But so just the fact that it’s a–that you actually are
able to identify that’s coming from a mobile carrier helps you in a sense because you know
the user’s mobile, so whatever IP geo-location tells you is probably something that you should
not rely 100% on but you can use confidence factors and other data points that we give
our customers to understand these circumstances. So, there was also a segment of dial-up users.
And that was actually kind of surprising because there was decent percentage of…
>>TANCRED: Yeah.>>SPECKBACHER: …of the overall traffic.
And again, the U.K. has dominated in the traffic there. There was some of the U.S. traffic
>>SPECKBACHER: Yeah, Japan.>>TANCRED: Tanzania.
>>SPECKBACHER: And then, there’s, you know, lots of developing countries on there, which
apparently still use modems. Anonymizers. So when you’re operating a gambling site,
you want to make sure that your customers are not circumventing your IP geo-location
solution. And typically, they’ll try to do that by cracking through a proxy server that
provide its–that provides a certain level of anonymity. If you’re trying to gamble with
a U.K. provider, what better proxy to use than the one in the U.K. and that’s basically
what we see here.>>TANCRED: Maybe, I can say a word about…
>>SPECKBACHER: Yeah.>>TANCRED: …anonymizer in the data. So
the way that Quova identifies anonymizers, they identify anonymizers by specific IP address
and activity receipt. We also–because we provide our data as network blocks, we also
identify network blocks that have anonymizing activity in them. So, a lot of this activity
is probably not anonymizer activity but is in a network block where we’ve seen anonymizer
activity. Certainly, so I wouldn’t expect that every transaction that you see here is
associated with someone using a proxy. But you can see at, you know, the graph certainly
gets wider as it moves to the right, which is what you’d expect during a big event that
you’d see more anonymizer activity at these sites. And as Tobias said, more in the U.K.
because they’re trying to reach sites that are in the U.K.
>>SPECKBACHER: Right. So, basically what we flag is bad neighborhoods so like for crimespotting
data, if you look at it, this is the network block that had some suspicious activity going
on in the past or recently. So you should be cautious in dealing with that type of traffic.
And so now, we segmented the anonymizer populations by carriers and it’s not very surprising that
most of these anonymizers are actually with hosting providers. So, they’re probably not
systems that are actively being used by actual users, unless this is having betting with
some customers. Yes?>>TANCRED: And this can be compromised machines
or hosts that people have setup specifically for this?
>>SPECKBACHER: Yeah. So, someone might get [INDISTINCT] set up with or the other possibility
is just that boxes get routed and [INDISTINCT].>>TANCRED: And the significance of this information
is that when you’re trying to prevent fraud, when you’re looking at traffic coming into
your sites, the more things you can correlate with, the better your prediction capabilities
are. So if you can correlate–if you’d know that certain carriers or certain organizations
or certain countries for certain connection types correlate better with known fraud, then
knowing all that data when–if the traffic is coming in, lets you treat those connections
differently than you would otherwise. And that’s what the financial institutions do,
that’s what e-commerce sites do, that’s what gambling houses do. And that’s why it’s important
to have this information. So, you know, it was a pretty brief look at a very small part
of our data. We’re just starting looking at this data. We’re just starting at looking
at different ways to visualize the data. What we’d like to do is make a lot of this information
public because the more people looking at it the more interesting things we’ll find
in the data. As people start looking at the data, I expect that, you know, we’ll see more
trends in the data and that we can start to use a lot of these user’s data to do things
like predict events, predict and prevent fraud, look at marketing trends. And they’re certainly
going to be a lot of assumptions that people have about traffic to different markets from
different places that can be either confirmed or disproved with this data. So, we’re excited
about this. We’re going to continue looking at it, like I said, hopefully, we’ll make
this data public pretty soon. And that’s it, any questions? Thank you. We were so interesting
that we distracted you.>>Yeah I am. So this is about [INDISTINCT].
>>TANCRED: Well, I mean, what the laws typically state that you’re using, you know, industry
best practices.>>Oh [INDISTINCT].
>>TANCRED: And, yeah, and it’s not–and there are certainly ways to get location data that
are not industry best practices. So if you’re trying to–if you’re, you know, selling restricted
goods to different countries around the world where those goods aren’t supposed to be sold…
>>Right.>>TANCRED: …then, using things like user
reported data wouldn’t be sufficient. You have to use some other kind of data or, you
know, even GPS now you see spoofing there, so, yeah.
>>[INDISTINCT]>>TANCRED: Yes, in our experience.
>>[INDISTINCT]>>TANCRED: Yeah, sure.
>>[INDISTINCT]>>TANCRED: Yes. So, let me ask–I’ll repeat
the question because I don’t know if the questions are coming through in the recording but the
question is, “When we put this data on the public, do we know what kind of visualizations
and graphs will allow, whether that they’ll be static or dynamic and things like that?”
You want to take that?>>SPECKBACHER: Sure. So, certainly, our goal
is to enable lots of people to explore the data. So, static graphs are not going to be
very suitable for that. Obviously, we’ll have to provide some level of pre-aggregation to
protect the innocent customers. But, you know, we can provide dimensionally aggregated data
and let people slice and dice those datasets however they want. So that’s the plan.
>>TANCRED: And I would expect that we’re going to probably provide some interesting
visualizations like this and maybe some more traditional ones that let people get a little
bit more statistical and specific with the data.
>>…it’s getting anonymized and what have you going with it all connected on. But you’ve obviously shown
that these kind of meet the new graph types.>>TANCRED: Right.
>>Are you implying something in the space where people will actually be able to navigate
these graph types?>>TANCRED: That’s our plan. I would expect
we’re not going to–well, at least in the first instance, the first exploration will
be through different graph types rather than just access the data directly. Although, we
might, depending on how we can aggregate and anonymizerd the data to make the data directly
available.>>And my second question is with the stream
graphs, have you done any kind of cross-dimensional analysis back where you all are actually using
it to find support correlation and trends into the dimensions with different methods?
>>TANCRED: It’s interesting, we’ve like–we’ve done that with multiple stream graphs. Like
I was talking about, looking at a specific city that shows weird activity and then looking
at different dimensions of that but that’s by running different–well, yeah, running
different stream graphs. And it’s actually been very interesting for us to see certain
things about our data that weren’t completely evident to us before. But I don’t know what
the stream graph’s capabilities are to look at multiple dimensions in the same graph,
if that’s what you’re asking.>>[INDISTINCT]
>>TANCRED: Yeah. I mean, what we wound up doing a lot, I mean, Tobias and I spent, we
basically spent a long time just creating interesting graphs. And you wind up creating
graphs on specific metrics and excluding specific things to get to the answer you’re looking
for. You know, so you look at interesting things like routing types against cities,
against carriers and organizations until some things start to make sense. Like that chokepoint
that we saw early Saturday morning, I think it was. If it exists across every routing
type and across every customer that we’re looking at and in every country, then it indicates
something maybe industry-wide. If it only exists for one of the customers then it’s
something specific to that customer. And so, that’s the kind of exploration you want to
do.>>Thanks very much.
library that you can use to create this. It’s called Protovis.
>>TANCRED: You know, I know everyone’s wondering about the stream graph on my shirt, so I’ll
answer that question, yeah, yeah. I didn’t plan on wearing the shirt. I brought it and
Tobias mentioned he pointed out that if there’s essential stream graph on it so I’d realized
I had to wear the shirt. So it’s not entirely intentional.
>>SPECKBACHER: It was a designer’s idea data, I think.
>>TANCRED: Yeah. It’s a–yeah, I’ll let you decide. Thank you.