thebaumblog: Man Versus Machine

The Great Firewall of China: Internet Censorship Run Wild

The past couple of days I’ve been visiting China meeting with some of our technology and channel partners. It just so happens I was present in Beijing for the 20th anniversary of the 1989 Tiananmen Square Events. Yes it really did happen despite what the Chinese government says. Speaking on Saturday at the F5 APAC Sales Kickoff I found myself staying over the weekend with Sunday off to roam around Beijing like a tourist, something I rarely get a chance to do on business trips. It is amazing to me to see how the Chinese and Taiwanese work on Saturdays. In the US we rarely see that. Europeans chastise Americans for working too hard but I guess they should really see the work ethic in Asia and then we’d look more normal.

Watching the 2008 Beijing Olympics last summer things there certainly seemed more normal than 20 years ago, but being there in person with all the festivities gone things seemed really strange to me. It is very difficult to describe. Maybe I was jaded by all the newspapers I’d read on the way to Beijing. On a nice long 13 hour flight from Washington DC with plenty of reading material I consumed James Kynge’s piece in the Financial Times questioning whether the Western media really understood why the student demonstrators were protesting. He went on ascribing the word “democracy” with the student motivations and questioning whether we or they really knew what it meant despite the fact that he spells out their desires in plan old English which sounds like democracy to me.

“Almost everything fell within its scope: campaigns against corruption, nepotism, inflation, police brutality, bureaucracy, official privilege, media censorship, human rights abuses, cramped student dormitories and the smothering of democratic urges. But to say the demonstrations were to “demand democracy” is an oversimplification.”
James Kynge, Financial Times

It’s almost impossible to describe the strange feeling I got while walking through Tiananmen Square observing the soldiers and the huge portrait of General Mao that dominates the landscape. Maybe part of it was due to the increased tension of the anniversary. Maybe not. Tiananmen has come to symbolize the unspoken and largely unrecognized tension between the economic progress driving modern China and the old fashion communist government still ruling there. The Chinese seem to have a foot in both camps. The eeriness I felt came not only from my surroundings and an understanding of the principles they stood for but also from the reaction of my Chinese and Taiwanese friends. Their usually jubilant outgoing personalities were completely subdued in the square. Was a sign of respect and mourning that drove their thoughts? Perhaps to some extent. But in quiet whispers and conversations out of the ear shot of any “green” uniformed soldiers (versus the “blue uniformed” security guards they confessed to being actually scared to speak for fear of someone or something listening. Challenging them I said, “surely you must be joking.” But it was no joke. Only when we crossed the street into the forbidden city did their usual personalities return.

Of course this began a prolonged conversation over the next 24 hours as we visited the great wall, a new Beijing restaurant and departed through the impressive new Beijing airport. I kept asking and trying to understand. How can a country of so many people be controlled by the minds of so few? What are the real limitations to speak out? And what effect will economic progress have on the political future of China? There was no shortage of stories supporting the fact that the government still does take a very heavy hand to those who disagree. But rather than discuss it, everyday Beijing seems to sweep the event of 20 years ago under the rug. As one of my Chinese friends said, “everyone is embarrassed and we just pretend it never happened.”

At the same time I was traveling through out China, the articles started pouring in about Beijing’s efforts to step up Internet and IT censorship. Upon reading the perspectives pouring in about “Green Dam” I was reminded of the impact the technology industry is having on the whole situation. It was bad enough I couldn’t get to sites like Twitter and Youtube form my hotel room. Now the Chinese government is requiring every PC sold in the country starting July 1st has to have special software blocking all sorts of things. The move is being presented as an attempt to protect children from online pornography but is obviously one more attempt by Beijing take its censorship to a new level. China currently has the world’s most sophisticated and multi-layered system of Internet censorship. Objectionable content on domestic Web sites is deleted or prevented from being published, and access to a large number of overseas Web sites is blocked or “filtered.” Decisions about what to censor are based on the Chinese government’s attempts to control the minds of 1.2B Chinese. There is no transparency or accountability, no public consultation in developing block lists or censorship criteria, and no way to appeal the blockage or removal of Web content.

In a notice to PC makers, the Ministry of Industry and Information Technology said all PCs shipped in China needed to offer Green Dam/Youth Escort, identified as a “green internet filtering software”, either pre-installed or as part of basic software packages. In May 2008, the government picked Jinhui Technology and Dazheng Language Technology, two Chinese software companies to develop the software, according to a contract award notice from the MIIT. While these companies claim their software is only being used to block sites although last year, researchers discovered that a Chinese version of Skype contained the ability to block politically sensitive words in instant messaging chats, and to keep a record of the use of such words.

Conficker is Proof We Need to Log Broadly and Analyze Deeply

At RSA this week it’s easy to got lost in the menagerie of security technologies to conquer malware proliferation, stomp out spam and protect virtualized and cloud computing environments. But the most recent statistics show we are still losing the war on cybercrime. Symantec’s latest Internet Security Threat Report sited 1,656,227 malicious-code threats last year and 75,158 new active bot-infected computers per day. And yes the United States is still the most frequently targeted by denial-of-service attacks accounting for 51% worldwide and the top country for underground economy servers advertising stolen credit cards accounting for 67% of all activity worldwide.

Why are we losing so badly? Not surprisingly, there was a lot of talk at RSA about the Conficker worm. Some of the chatter points to reasons why the security industry is falling behind. At first glance, the Conficker worm looks harmless. So far there are not too many significant reports of infected machines and hijacked data,
but it may be too early to feel so smug about it. The worm’s real danger is its demonstrated ability to evade the expensive IDS technology enterprises have put into place and rely on today. Estimates are that 90% of the enterprise IDS implementations have failed to detect the worm’s presence and create some kind of actionable alert. How can this be?

Conficker properties are simple but different from the typical threat. First Conficker affected systems outside of IDS coverage like USB keys and mobile user laptops. So if you’re looking for attacks from outside your network only, you won’t see it. It’s a “walk-in virus”. Second it isn’t greedy like Code Red and other viruses of late. The Conficker worm has built-in sleep cycles. So where a typical worm might scan 1,000 or 10,000 IPs a minute, Conficker was happy to scan maybe say 100 and evade the baseline trip wires. Third Conficker is very selective with its payload delivery. It only delivers when it sees a vulnerability. All this helps Conficker evade IDS systems that want to witness the crime. But Conficker is the perfect crime in that it goes undetected. With no payload delivered and seemingly fewer IPs scanned there is no grossly abnormal behavior to witness. The evidence is circumstantial.

At a lunch on Wednesday, Tom Le of BT gave a good overview of how BT Managed Security Services detected Conficker for their customers. It was one of the first times I’ve really been sold on a managed security service beyond the value of cost and convenience.

First, as Tom explained it, they started by assuming IDS would miss the attack. They didn’t assume a payload had to be delivered and didn’t assume that large number of scans were needed to indicate the presence of an intruder. Instead of depending on IDS, BT uses logs and events to baseline the natural behavior of even netbios triggered scans (which Conficker happened to use) and was able to alert on small changes in scans that would be missed if you were only looking at things like netflow. As it turns out most firewalls blocked the netbios scans going out so again most customers didn’t even know they had the Conficker worm present.

Second Tom and his team assumed some type of command and control activity associated with Conficker. They followed the money watching for things like confikur trying to phone home in different ways. By having a broad set of logs and events from switches, routers, applications and IDS they were able to look for outlying behaviors like DNS lookups to obscure locations not typically seen in customer networks and aggregate this information across customers to identify common abnormalities. Tom estimates that BT sees roughly five billion messages a week across their customer base. That’s a lot of messages.

After listening to all the chatter about Conficker and walking the show floor, it gets easier to understand how criminals continue to evade the security infrastructure enterprises put in place. There are just too many ways in which breaches can occur and there is just too much data scattered about to collect and correlate in order to find the anomalies. So the security industry continues down the path of specific solutions to specific vulnerabilities and criminals continue to create new threats that evade the industry’s point approaches. I say the industry as a whole needs to move to more of an adaptable and flexible approach that can apply security to what ever threats arise, when they appear.

The best real world detectives are able to piece together seemingly circumstantial evidence and sift out the clues that lead to catching criminals. But every time it’s different. Perhaps we need to take the same approach in order to obtain more adaptable security solutions. Assume every time it’s different not the same.

Logging broadly and analyzing deeply is one of the best defenses. Without a broad swath of data you won’t have the pieces of the puzzle to put together at the moment you need to solve the crime.

Few criminals are caught in the act.

Human and Machine Language Mashups at Splunk Live Zurich, Switzerland

At Splunk Live in Zurich this week an interesting discussion erupted about human and machine languages. Before I continue with the story, I want to thank everyone that attended the event. Despite the fact that Raffy Marty is a resident celebrity, this was our first formal customer and partner event in Switzerland. We had more than 50 people attend for several hours to talk about Splunk and data center management challenges. The event was co-hosted by T-Systems.

Thank you Meno Schnapauff for your great presentation on how T-Systems and the Swiss National Railway are using Splunk!

Other attendees included folks from Swisscom, Unicom Consulting, Rothschild Bank, Genossenschaft Migros, LeShop, Netcetera, Cablecom GmbH, TBK-Patent Munich, On Line Video 46, Skyguide, PostFinance and the Univestity of Fribourg. Brian Haynes, Tim Thorpe, Julie Duncan and Hash Basu-Choudhuri from our London office participated too.

Now part of the reason I mention all these names (in addition to thanking folks) is to the point of this post. In the room we had an American (me), several native English speakers from different areas of England, Swiss German speakers from Switzerland and German speakers from Germany. What I noticed is how two people think they speak the same language but can’t always understand each other. It turns out there are a lot of American (some West Coast) colloquialisms I use that my “queens English” counterparts don’t understand. And of course most of the time I try to make a joke the Swiss and Germans just look at me like I’m from outer space even though if you asked them they’d say they speak fluent English. During the event the Swiss Germans had trouble understanding the Germans and the Germans had trouble understanding the Swiss Germans. The folks from the UK who spoke German didn’t understand either the Swiss German or the German German although they all claim to speak German.

What does all this have to do with IT you ask? Well it turns out that mashing up languages and attempting to understand each other even though we don’t speak exactly the same language is one of the biggest problems we have in trying to understand our IT systems as well.

“One of the questions posed at the event was how can I modify my system and application logging to some standard in order to follow what my systems are doing? Do we need a logging standard?”

I have long been telling people that logging standards are a waste of time. IBM’s Common Base Events (CBE) has been around for decades and has very little traction in the real world. Data Center Mark-up Language (DCML) was pushed by Opsware and lots of smart people. It got nowhere. Logs exist. Instrumentation exists. Our IT systems already have tremendous amounts of data. Trying to retrofit that data to some standard is impossible. Attempting to organize a multi-vendor logging standard will never happen. Getting developers to log consistently sounds great but I’ve never seen it done before.

What we need is a mashup of machine languages and logging formats. That’s exactly what IT Search is!

Humans need to stop thinking about how we can format data to make it easier for machines to work with it. There is too much data. The real value is being about to work with massive amounts of data without any human intervention. This is exactly what Google does for the web. Sure you can reformat your HTML to get better search results. But even if you do nothing Google will index your site. You don’t even have to tell Google to do it!

I’m going to start sharing more of our experiences helping people see the connections that already exist in their logging data. While the connections are not always obvious to the naked eye and human linear thinking, machines are great at teasing out non-obvious relationships. This is perhaps the most compelling thing we work on at Splunk and continue to push the bleeding edge of what’s possible.

Life after SIEM. Situational Awareness is next.

We’ve been hearing a lot lately about the death of SIEM technologies. But isn’t the question less about a legacy technology dying and more about the dimensions on which the next mass adopted security capability will be born? Clayton Christensen first described a model for disruptive technology in his book The Innovator’s Dilemma and his follow on The Innovator’s Solution. Christensen describes a theory about how disruptive technologies over take sustaining technologies by delivering value on new dimensions that established vendors overlook as unimportant, low end or just don’t think about because they’re too busy improving their legacy. Christensen’s work offers an interest framework to think about what’s taking place in the market for SIEM security management solutions.

Any enterprise trying to secure their IT infrastructures knows the state of the art in SIEM security approaches falls short. And trends like virtualization are making things even more difficult. System and security administrators and analysts are inundated with too many potential incidents and its too difficult and time consuming to investigate even a fraction of them. Achieving a greater comprehension of the meaning of potential incidents and the projection of their status in the near future is the real goal. The idea, called “situational awareness” is often, however, impossible to achieve. We are so dependent on pre-programed rules in our SIEM solutions that we lack the ability to perform our own analysis because the original raw data has been filtered out, thrown away or we have no practical way to make sense of it.

Observation: If the technology is sufficiently complex as to allow the vulnerability to exist, can we really build complex technology to catch all the possible issues or scenarios?

As a reference point see David Hazekamp, Security Architect at Motorola, talk about the importance of retaining all security data across the Motorola global SOC infrastructure and integrating access to all this data into existing SIEM solutions.

Of course reaching this understanding requires one suspends their disbelief about the effectiveness of current SIEM security technologies. Usually this means you’re not a vendor or you’re a vendor with little or no vested interest in current approaches. So with this let’s examine the typical enterprise deployment of security technologies.

Defense in Depth

This is where every good enterprise security architecture starts. In order to begin securing your environment you’ve got to have data, raw data. In most data centers this takes the form of syslog from network devices and servers, SNMP traps, OPSEC or LEA interfaces for firewall events, WMI for Windows desktop and server events, IDS and IPS signature scans and application level firewall examination of common services like FTP, HTTP, SFTP, SCP etc. The thinking is you need to look at everything. Perhaps you’ll even want to pull in information from physical security systems like badge readers.

Security Information Management (SIM)

The next step in the process is to manage all this raw data and filter it down to a manageable number of events, traps and alerts. Collecting, storing and providing some basic analysis on all this data is the job of a SIM. Typically, as Raffy points out, the data is parsed, normalized and stored in a structured RDBMS. Parsing, normalizing and structuring all this data is great if the data doesn’t change or you don’t have too much of it. But if you’re dealing with data formats that aren’t static or you’re trying to store terabytes of this data an RDBMS won’t be your friend.

Security Event Management (SEM)

Once a SIM has done it’s job you’re ready to aggregate, correlate and start reporting on potential incidents using a SEM to do the job. SEM’s usually consist of lots of rules that look for combination and patterns of events indicating that a possible attack or breach may be underway. Essentially the SEM rules attempt to codify what we humans know about vulnerabilities in our IT systems and possible ways to exploit them. The goal is to provide some real-time information usually in the form of reports, dashboards and visualizations to operations and security analysts who work to keep the infrastructure secure.

Situational Awareness (SA)

SIEM correlation can be interesting for discovering a pattern or related event but the ability to work an issue outside of these “canned” rules and events becomes the real problem. Unfortunately, what all to often happens is there are so many possible attacks, operations and security staff are overwhelmed with potential incidents to investigate and not every event or pattern of interest is going to be discovered via the pre-built rules. Situational awareness is the attempt to perceive environmental elements within a volume of space and time. Comprehension cannot be achieved if the data being bubbled up is filtered according to a set of rules and the technology does not allow a human to perform their own analysis of the raw data as generated by the environment itself. All technologies have their weaknesses and those that perform correlation are no different.

Thus whilst canned SIEM correlation provides value in bubbling things up — we still need the ability to dig into the raw data to fully perceive and comprehend what is taking place. Now mind us all SA is not a new concept. It has been applied rather robustly by decision-makers in complex, dynamic areas from aviation, air traffic control, power plant operations, military command and control — to more ordinary but nevertheless complex tasks such as driving an automobile or motorcycle. And yes it has been mentioned before in security operations, particularly in government agencies.

Man Versus Machine: Part One

Recently I gave a talk at the BT annual technology gathering. The setting was a really beautiful estate called The Grove just north of London in Hertfordshire England. A couple hundred of BT’s smartest technology managers were in attendance and I was supposed to think of something to hold their interest for an hour. I got to thinking about all the technology and infrastructure BT must have and how in the world do they manage it. I started gathering data. With internal growth, new projects like BT’s 21st Century Network and acquisitions over the past decade through BT Global Services outsourcing contracts the company has a lot of IT infrastructure.

  • 74 data centers,
  • 163 countries,
  • 3,000 applications,
  • 6,000 different types of systems/devices and
  • 17,000 IT staff (6,000 BT and 11,000 outsourced).

I also spent a few hours with some of BT’s brightest architects who are working on attempts to virtualize every layer of their infrastructure — network, storage, database, application, web servers, VoIP, collaboration, ordering, billing, provisioning, monitoring etc. What’s their biggest problem I asked. Resoundingly it was “our customers are still often the ones that tell us stuff is broken.” This was so reminiscent of my time at places like Yahoo! where we’d have these 7×24 war rooms during key outages and the daily conference calls with 30-40 people on the line all emailing logs and configurations to each other.

As our IT infrastructures become incredibly complex, dynamic, service oriented, virtualized and mission critical we’re confronted with this battle raging in our data centers. And it appears the machines are winning and the humans are losing.

Our biggest problem is figuring out — did something go wrong? Why? Where does truth lie? According to market researcher IDC In 2007 > $140B spent managing the world’s data centers. IT OPEX is growing at 2.5 times the rate of hardware spend and 1/3-1/2 of TCO is spent recovering from problems. The cost of availability now dwarfs the purchase and maintenance cost of technology.

So what have we as an IT industry done to address the problem?

We’ve created concepts like ITIL and CMDBs. While there are some good processes improvements here for sure, these top down modeling approaches and pre-determined rules only tell us what we already know. In my experience it is not the things we already know about that bite us in the ass and take our systems down for prolonged periods of time. It’s the multitude of unanticipated and unavoidable dependencies and interactions that take place in an complex system. And it’s impossible to know what set of dependencies and interactions will cause downtime until it occurs. Our infrastructures are just too indeterminate. That’s the point after all. Tier it, load balance it, virtualize it. So we don’t have to worry about the dependencies and interactions among all the different components. Well guess what? We do have to care. Because we have to fix it when it goes wrong.

Take the analogy of a complex air traffic control system. Sure the air traffic controllers feel really great when they arrive at work in the morning. They’ve got their coffee, flight plans and a good handle on the early morning inbound and outbound traffic.

flightplan

Then the day gets a bit more challenging. Weather conditions over Chicago backs up landings at O’Hare. A baggage handler and mechanic strike slows down JFK departures. A pilot radios he’s three degrees north over Pennsylvania but where is he really? Now you need radar. Throw the flight plans out the window. You needs to know what’s actually happening now.

radar

So how do we establish the equivalent of radar for a complex IT infrastructure. Component monitoring doesn’t work any more. If the problem is a single component failure, we already know about it. We’ve already automated the swapping in of a new machine or device. And we can reboot software components automatically. IBM’s has their own marketing play on this called “Autonomic Computing” but that too seems to only focus on the simple single component issues not the indeterminate chaos that ensues in a real running system. And it seems like more slideware than real solutions.

In my next post I’ll tackle the issue of how we might look at things differently.

Stay tuned.

Welcome!

I’m Michael Baum. Welcome to my blog.

I hope to find time to write about some of my favorite topics including:

  • Splunk and IT Search.
  • Technology gadgets and software — the stuff we all like to use.
  • Datacenter applications, servers, networks and security — the stuff we all have to keep running.
  • Business, entrepreneurship and venture capital.
  • Wall street and investing.

Comments are always welcome and you can also reach me via email at thebaum (at) splunk (dot) com.