In Which I Brag About Brand Safety

I read with slight amusement this afternoon this article: Google, ANZ, GE Money ads caught on porn sites. Why? Because it’s within scope of my job, and I couldn’t help but felt I had done quite well. In short, it gave me pride in my job* It's a strange feeling. I don't usually feel that - I usually feel that my work could improve - I felt good. And yes, because of that, I am going to write and brag about it.

You see, the company I work for (hereforth: my company) is not a large company with a lot of resources, and it is quite comforting to know that bigger, better-resourced companies face the same problem too. We’ve recently been pursuing brand safety as a selling point for the agency-facing side of business. Come to us, we say, for we would sell you only brand-safe inventory, and we only show your advertisements on brand-safe sites. It is a bold claim for a medium-sized company like ours. I happen to be the one who is doing quite a bit of the work in the brand-safety side of things (no, I’m not the only one. My colleagues also play a large role in creating our brand-safety policy). Join me, as I regale to you, my dear reader into the world of online advertising.

Brand Safety

What is brand safety? Essentially, imagine that you own a big brand like Coca-Cola. Would it be good for your brand's image if Amy Winehouse was pictured dead bottles of Coca-Cola lying around* Toosoon? ? No. It wouldn't, and hence the image is not brand-safe. It is likewise for online advertising. Unfortunate placements like having your ad show up on porn sites is a bad thing. However, defining brand-safety was a large problem for us in when the company pivoted its direction and was headed towards brand safety as a unique selling point. We brainstormed over this for a while, and we finally settled on these "canons":

Brand-safety is not independent of brands. Brand safety in online advertising is heavily dependent on the brands that are being advertised. As such, digital advertising agencies like Adconion should always ask for the brand safety guidelines of the brands from the advertisers themselves. This brand safety information should then be communicated to the ad networks.
There are multiple levels of brand safety that can be universally agreed upon. This is sort of like a collorary to the first canon. While sites like deviantart * We've had different advertisers who approved and rejected DeviantArt respectively are debatable and often subject to the brand safety guidelines of the advertisers, there should be a baseline set by the ad network* I am aware that publisher networks are also responsible for declaring what they deem as brand safe, but this blog only regards ad networks, seeing that I am actually working in one on what exactly is brand safe inventory. It is my opinion that this baseline will make or break any ad network that deals in brand safety.
There exists non-premium brand-safe inventory. A common misconception is that if an ad placement is in an obscure site on the internet, it is non-brand-safe. True, sites like Sydney Morning Herald are premium, and indeed, brand safe. Premium sites are defined as sites where an advertiser can expect to pay a higher price (usually CPM) to get an impression on. Premium sites signal brand-safety, but are not necessarily brand-safe.
Observe and Report Brand safety and brand protection is a constant-monitoring process. At work, we use up to 5 tools to constantly monitor brand-safety. I wrote 3 out of those 5 tools. We have an almost 24-hour process monitoring which sites our ads are on. Commercial products like AdXpose (now AdEffx) and Peer39 or the very new and mysterious Project Sunblock are available, and we do use them too. But to our surprise, we found that additional tools were needed, and so I wrote them.
Trust No One This is more towards media buys. There have been multiple incidents where a media-buy we had thought to be brand-safe turned out to be very ugly - we lost several businesses and several very expensive lessons were learned. As such, we have instituted the Trust No One rule, where we would constantly audit the impressions internally to see if the impressions were brand-safe.

Problems in the Industry

Reviewing the five "canons" set up, let's make a detour and review the problems that currently exist in the online advertising industry. Do note that these are based off personal observation and has not been recorded in an empirical fashion* This is unbecoming of me - I used to track everything :

Brand-safety is not independent of brands. More often than not, agencies are clueless or vague about the brand-safety requirements of their clients. This gets passed on to the ad networks who will do their best on the brand safety front. Often, tools like Peer39 and AdXpose will be used, but with the brand-safety definition as defined by the product companies (AdXpose or Peer39). A good example is the category of file-sharing sites. AdXpose uses an inverse scoring function with a threshold to define the brand safety of a site. By the definitions of AdXpose, a number of file-sharing sites are indeed brand-safe, but upon asking the client for proper clarification, it was revealed to not be brand-safe. I personally think ad agencies should take a more proactive step in vetting site lists, and providing clearer brand-safety guidelines to the ad networks. They are afterall, representing their clients.
There are multiple levels of brand safety that can be universally agreed upon. A lot of ad/publisher networks I have encountered have a fairly low bar of brand safety levels. This is by no means a criticism on them. As an economist, I fully understand the concept of a firm to make money for itself. In terms of game theory, brand safety can be thought of in two ways - one: it is a coordination game - if everyone goes for brand-safety within a market, the market would be better off; two: it is a repeated public goods game, where brand safety is treated as a public good, and repeated plays of the game makes contribution to the public good (i.e. brand safe inventory) lower and lower. There are many other ways to frame the problem, but I am personally to lazy to spell them all out.
There exists non-premium brand-safe inventory. This is a major problem, albeit one that can be solved by mathematics and better algorithms (in fact I had worked out some of the algorithms to increase the efficiency of our media buys). You see, ad networks solve a problem of many-to-many matching, but it has also wrought upon the industry new problems like efficiency problems. Where in the past Advertiser A and Publisher P could have just gone through Network N, there are now an entire array of networks in between A and P. This is particularly obvious in marketplaces like OpenX marketplace or Right Media Exchange. The inception of Demand Side Platforms were supposed to solve this problem, but it's a problem that was not properly understood, and hence various implementations of DSPs will have various effects in making the market more efficient. On the plus side, if you knew where to look (or in the case of online advertising, what graph to traverse), you CAN indeed find non-premium brand-safe inventory.
Observe and Report A lot of ad networks fail this one. We did too, in the past, but after several important and expensive lessons, we've learned to constantly observe, report and take action. To me, arguments that doing so would be too labourious or too tedious a task are clearly excuses. In my company, we're not perfect, but we're getting there.
Trust No One As previously mentioned, we learned the hard lessons on this issue. We had a very premium publisher network with a list of premium, brand-safe sites that we did a media buy on. But human errors DO occur, and because at that time we did not have the checks and balances in place, we failed to observe that our ads were serving on non-brand-safe inventory. We lost some businesses along the way, and I feel as a whole, any company in the industry dabbling in brand safety can do with learning this.

What I Do

So where do I, an economist/statistician/mathematician come in? Well, mainly the statistician part. On an average day, my company deals with about 1 million brand-safe impressions per hour. That's about 20 million impressions a day * we actually do more than that, but ~20 million imps a day is what we monitor - because they're our most premium campaigns . There is no way a compliance team is able to sit down and audit 20 million impressions a day. So who do you turn to? The only statistician in the company.

I had essentially built a program to audit impressions pretty much live. It will sound alarms and klaxxons* Not literally, but it's been a recurring joke in the office that we install speakers to do that if an ad has showed up on non-brand-safe inventory. The program is built with Python, redis, MySQL, zeroMQ, and a smattering of javascript in form of node.js. We went through 4 versions in the span of 6 months, where each version marked a shift in architecture paradigm.

It initially started with just writing a daily CSV of all the impressions served. And then as the requirements changed, and we had to track more impressions, the architecture slowly changed from a single-process, single threaded program that wrote flat files to something multi-processed that used a combination of MySQL and a flat file, and to something scalable that used redis, MongoDB and MySQL, to finally, the current upcoming version, that uses redis and MySQL. All the while, the level of programming goes lower and lower. I started using highly abstracted files, and the upcoming version uses raw socket information hot off the ad server.

The program itself does this:

Monitor ad impressions
Check ad impressions against a cached Support Vector Machine database. Raise alarm if unsafe.
In the background, there are services that feed the Support Vector Machine - it is a program that "learns" what sites are brand-safe and what sites are not.
The Support Vector Machine is itself boosted by various smaller SVMs and also a human component exists to "teach" the machines. This of course, is the secret sauce, and I cannot divulge further.
While the SVM runs asynchronously with the impression monitor, there is another process which I will refer to as a super-asynchronous process, which runs from time-to-time to validate and revalidate the SVM data.

It's been fun, and I'm feeling particularly proud, simply because I did it with just shoestrings (actually, more like a single quad-core computer with 8GB of RAM running Ubuntu). As I have mentioned before, my company isn't a large company. It does not have much resources - and yet against all odds, I think we did quite well in comparison with other much larger and better funded companies like Adconion. Alerts DO happen and we have been able to take almost instantaneous action, and over the past few months, the alert levels have dropped to a level I can call 'relaxing'. I have also learned a lot of new stuff, both technical (in particular - software engineering) and with regards to management (in particular management of teams and software development* To wit, I still suck at managing software development. The only reason why I could develop 4 versions in 6 months was because there was only one person writing the code - it's easier to work alone than in teams ), which in my opinion is a good thing.

Ah well. There you go, my bragging is done. I hope you enjoyed my 2000 words of bragging and rambling. I shall have to go back to doing work - Pressyo work this time.