why the focus on devops

I’ve built more than 1400 virtual machines in the past 18 months.

how do you manage that many machine?


I think that the most important is the PXE server, I just don’t think that enough people fiddle around with the different options.



for a second when I read this article.. I really expected there to be a programming language named turducken

To compound matters, this is a simple or “standard” column family. It is fairly limited by itself. Many column family databases, including Cassandra, include the concept of a supercolumn and a supercolumn family. Structurally, it is a bit like turducken in turducken. The example below shows a supercolumn family containing column families.



recursive queries

I love recursion, and parent child relationships
I just found a wonderful example, and had to share the experience

(note, I am not the author, just republishing)

Many SQL implementations don’t have loops, making some kinds of analysis very difficult. Postgres, SQL Server, and several others have the next best thing — recursive CTEs!

We’ll use them to solve the Traveling Salesman Problem and find the shortest round-trip route through several US cities.

recursive queries

With Recursive

Normal CTEs are great at helping to organize large queries. They are a simple way to make temporary tables you can access later in your query.

Recursive CTEs are more powerful – they reference themselves and allow you to explore hierarchical data. While that may sound complicated, the underlying concept is very similar to a for loop in other programming languages.

These CTEs have two parts — an anchor member and a recursive member. Theanchor member selects the starting rows for the recursive steps.

The recursive member generates more rows for the CTE by first joining against the anchor rows, and then joining against rows created in previous recursions. The recursive member comes after a union all in the CTE definition.

Here’s a simple recursive CTE that generates the numbers 1 to 10. The anchormember selects the value 1, and the recursive member adds to it up to the number 10:

with recursive incrementer(prev_val) as (
  select 1 -- anchor member
  union all
  select -- recursive member
    incrementer.prev_val + 1
  from incrementer
  where prev_val < 10 -- termination condition

select * from incrementer

The first time the recursive CTE runs it generates a single row 1 using the anchormember. In the second execution, the recursive member joins against the 1 and outputs a second row, 2. In the third execution the recursive step joins against both rows 1 and 2 and adds the rows 2 (a duplicate) and 3.

Recursive CTEs also only return distinct rows. Even though our CTE above creates many rows with the same value, only a distinct set of rows will be returned.

Notice how the CTE specifies its output as the named value prev_val. This lets us refer to the output of the previous recursive step.

And at the very end there is a termination condition to halt the recursion once the sum gets to 10. Without this condition, the CTE would enter an infinite loop!

Under the hood, the database is building up a table named after this recursive CTE using unions:

recursive queries

Recursive CTEs can also have many parameters. Here’s one that takes the sum, double, and square of starting values of 1, 2 and 3:

with recursive cruncher(inc, double, square) as (
  select 1, 2.0, 3.0 -- anchor member
  union all
  select -- recursive member
    cruncher.inc + 1,
    cruncher.double * 2,
    cruncher.square ^ 2
  from cruncher
  where inc < 10

select * from cruncher

With recursive CTEs we can solve the Traveling Salesman Problem.

Finding the Shortest Path

There are many algorithms for finding the shortest round-trip path through several cities. We’ll use the simplest: brute force. Our recursive CTE will enumerate all possible routes and their total distances. We’ll then sort to find the shortest.

First, a list of cities with Periscope customers, along with their latitudes and longitudes:

create table places as (
    'Seattle' as name, 47.6097 as lat, 122.3331 as lon
    union all select 'San Francisco', 37.7833, 122.4167
    union all select 'Austin', 30.2500, 97.7500
    union all select 'New York', 40.7127, 74.0059
    union all select 'Boston', 42.3601, 71.0589
    union all select 'Chicago', 41.8369, 87.6847
    union all select 'Los Angeles', 34.0500, 118.2500
    union all select 'Denver', 39.7392, 104.9903

And we’ll need a distance function to compute how far two lat/lons are from each other (thanks to strkol on stackoverflow.com):

create or replace function lat_lon_distance(
  lat1 float, lon1 float, lat2 float, lon2 float
) returns float as $$
  x float = 69.1 * (lat2 - lat1);
  y float = 69.1 * (lon2 - lon1) * cos(lat1 / 57.3);
  return sqrt(x * x + y * y);
$$ language plpgsql

Our CTE will use San Francisco as its anchor city, and then recurse from there to every other city:

with recursive travel(places_chain, last_lat, last_lon,
    total_distance, num_places) as (
  select -- anchor member
    name, lat, lon, 0::float, 1
    from places
    where name = 'San Francisco'
  union all
  select -- recursive member
    -- add to the current places_chain
    travel.places_chain || ' -> ' || places.name,
    -- add to the current total_distance
    travel.total_distance +
      lat_lon_distance(last_lat, last_lon, places.lat, places.lon),
    travel.num_places + 1
    places, travel
    position(places.name in travel.places_chain) = 0

The parameters in the CTE are:

  • places_chain: The list of places visited so far, which will be different for each instance of the recursion
  • last_lat and last_lon: The latitude and longitude of the last place in theplaces_chain
  • total_distance: The distance traveled going from one place to the next in the places_chain
  • num_places: The number of places in places_chain — we’ll use this to tell which routes are complete because they visited all cities

In the recursive member, the where clause ensures that we never repeat a place. If we’ve already visited Denver, position(...) will return a number greater than 0, invalidating this instance of the recursion.

We can see all possible routes by selecting all 8-city chains:

select * from travel where num_places = 8

We need to add in the distance from the last city back to San Francisco to complete the round-trip. We could hard code San Francisco’s lat/lon, but a join is more elegant. Once that’s done we sort by distance and show the smallest:

  travel.places_chain || ' -> ' || places.name,
  total_distance + lat_lon_distance(
      travel.last_lat, travel.last_lon,
      places.lat, places.lon) as final_dist
from travel, places
  travel.num_places = 8
  and places.name = 'San Francisco'
order by 2 -- ascending!
limit 1

Even though this query is significantly more complicated than the incrementerquery earlier, the database is doing the same things behind the scenes. The top branch is the creating the CTE’s rows, the bottom branch is the final join and sort:

recursive queries

Run this query and you’ll see the shortest route takes 6671 miles and visits the cities in this order:

San Francisco -> Seattle -> Denver ->
Chicago -> Boston -> New York -> Austin ->
Los Angeles -> San Francisco

Thanks to recursive CTEs, we can solve the Traveling Salesman Problem in SQL!


Windows Server HyperV with Deduplication 400 to 1 ratio

I’ve been playing with the Linux world for the past few months. I’ve learned some new stuff.. seen some intersting scripts.

I’ve been having fun.

But the punchline. We ALL know that Linux just STILL AIN’T CLOSE to as cool as Windows Server.

Check out these storage ratios that I am getting

I am gettinga 400 to 1 ratio ofdeduplication / compression.


buy one or two Red Hat Enterprise licenses and use CentOS everywhere else.

I ran across this quote on Stack Overflow today.. I just HAD to share.

“most of my customers do is have only one or two genuine RedHat systems, and run most of their computers on CentOS, which is a free rebuild of RedHat. If a problem is found, it is reproduced on the RedHat systems, and the vendor will happily support the issue from there.”  http://unix.stackexchange.com/questions/6945/what-makes-centos-enterprisey-compared-to-generic-distributions-like-ubuntu

This seems like a TERRIFIC strategy for maximizing your ROI.. and a good way to spend a LITTLE bit of money.. and make it stretch.

HOW HAVE I *NOT* HEARD OF THIS STRATEGY? Although I STARTED playing with Linux about 12 years ago.. I never really have been happy with it.. It seems like there was ALWAYS one or two things that just killed my attention for Linux.  For example.. I didn’t REALIZE that apt-get would upgrade mySQL.. for the LONGEST time.  That is like.. the most fascinating thing in the world to me.  Upgrading mySQL.. is one thing.. that has ALWAYS made me terrified. I just do NOT want to have to troubleshoot a ‘mysql backup’ by hand.. Just AMAZED with how flaky mySQL is with some things.

I’ve ALWAYS had a chip on my shoulder.. about mySQL performance. It has always seemed like WordPress runs WWWWAAAAAAYYYYYYYY too slow when it gets about 100k records.. and I’m a TOTAL snob about performance.  I just insist.. that the indexing options in mySQL are COMPLETLY inferior to MS SQL Server.   but it’s just NOT WORTH FIFTY GRAND PER PROCESSOR for Microsoft.

Performance is always a bit of a touchy subject for my clients.  Not BAD performance.. it’s just that they’re SCARED TO ASK FOR GOOD PERFORMANCE.

I swear.. MOST clients I’ve had over the past 15 years… for some reason claim that they ‘don’t care about performance’.  I really don’t get it. of COURSE everyone wants better performance..   Why are people TERRIFIED to care about performance. Is it the PRICE TAG?

Looking back.. I have recommended Linux / BSD / Debian / Ubuntu systems.. at my last couple of jobs.. and I’m just NOT HAPPY working in a bigoted shop.   I’m not sure I can EVER work in a ‘Microsoft-Only’ shop ever again.  It’s like RACISM to me… when people won’t consider Linux.. ‘just because they don’t know it’.  That can’t be an excuse.. not for another day.   I *USED* to be the ‘most mindless Microsoft drone ever invented’.  My backyard.. growing up.. was about 100 yards from Redmond city limits.   But YEAR after YEAR. when Microsoft keeps on changing things ‘just for the hell of it’.. I’m sick of it.  I’m TOO CONSERVATIVE to be on the Microsoft stack anymore.  Fully 80% of the time that I’ve invested.. in training, and practicing MIcrosoft skillset.. has been flushed down the drain… because Microsoft *ALWAYS* has to come out with the ‘latest and greatest’.

I’m just SICK and tired of being tied to a single vendor.

I’ve been TRYING to push Linux.. for a FEW years now… And now that I’m emotionally involved with WordPress.. I’ve really set out to expand my horizons.. and learn a LOT more.

I just LOVE the challenge. I guess that Windows has BORED me.  I really feel like I haven’t had a challenge in the Microsoft world.. in what FEELS like a long time.

I *NEVER* saw the excitement.. in trying to work my butt off for a decade.. focusing on learning TEN LANGUAGES to replace VB6.  I think that Microsoft lost the war.. when they alienated their best constituency in 2002.



App Integration vs Data Integration – Who Will Win?

I was at Starbucks the other day (surprise, surprise) with my friend Matt whose company was getting ready to invest significant time and money to integrate the ERP systems of a company that they had just acquired. I asked him why bother integrating operational systems (except to ensure that the different operational systems are using the same customer, product, store, and other master files), unless there is significantcost savings.

rocky bill pic 1

Instead of trying to integrate the operational systems, why not invest in integrating the data that comes out of those operational systems? As long as you can enter orders and pay people in a timely manner, who cares if you can capture an order a sub-second faster or pay someone seconds faster than before? Is integrating your transactional data capture really the best place for IT to invest their precious resources in today’s competitive world?

ERP is so old school. I wish I had known back in the 1990’s and early 2000’s what I know now:  that trying to create competitive differentiation within or across packaged, monolithic ERP, MRP, CRM, SFA, and other operational systems only benefits the ERP vendors and the systems integrators whose business models are built on the endless customizations to those ERP systems.

If you’re an organization considering an ERP system upgrade or integration, you seriously need to consider how much you want to invest in customizing that ERP system that, at best, just delivers business parity. Or should you invest your time, money, and human resources in building customer-facing apps that provide unique customer value, business differentiation, and competitive advantage?

Integrate the Data, Not the Applications

The graphic below nicely summarizes the value creation transformation occurring within IT organizations (see Figure 1). Organizations are realizing that the business value of their operational systems doesn’t lie in their ability to capture an order faster than their competitors, but instead lies in the depth and breadth of data that can be integrated and mined to capture new insight into customers, products, and operations.

bill blog pic 2 6.23.14

Figure 1: Value transition from app-centric to data-centric

It’s a transformation from application-centric mentality (trying to create value in the deployment and customization of monolithic operational application) to a data-centric mentality (mining value out of the wealth of data held captive in those systems).

Figure 2 below shows a typical IT operational environment. Multiple operational systems manage the transaction processing for various business functions like manufacturing, distribution, inventory, payroll, human resources, finance, call centers, sales force automation, etc.

bill blog pic 3 6.23.14

Figure 2: Traditional monolithic operational apps

You can buy these applications from a mega-vendor (who has probably acquired numerous other vendors in order to create a “ransom note” of loosely connected applications), or you can select a best-of-breed approach where a systems integrator tries to tie these applications together. Either approach leads to a brittle, hard-to-scale, expensive-to-maintain architecture and a significant investment in systems integration and consulting resources to keep these “Franken-architectures” running. And what do you get in the end? Nothing more than business parity.

This quote from a Business Week article titled “Plex Systems: Detroit’s New Dashboard” summarizes the ERP value challenge quite well:

Inteva Chief Information Officer Dennis Hodges explains that because each [of their] offices had its own ERP system running on a local server, managers in Michigan had no way of knowing what was happening in Alabama, Mexico, or Poland. The company was spending more than half a million dollars a month on an ERP product that didn’t allow management to look at revenue and margins across the company.

See my blog “Developing Competitive Differentiation” for more thoughts about where best to invest your precious IT resources to deliver competitive differentiation.

The Role of the Data Lake

Don’t invest (waste?) time and money to integrate your disparate operational applications. Instead, invest in a data architecture (see Figure 3) that allows you to integrate all of the data across those disparate operational applications and is able to capture the other 90%+ of the corporate and external data needed to achieve business differentiation.

That investment in data architecture will enable you to differentiate with superior customer service, successful new product introductions, campaign marketing excellence, fraud elimination, predictive maintenance, revenue loss minimization, increasing market basket margins, reducing the number of hospital-acquired infections, lowering hospital readmissions, etc.

Bill blog pic 4 6.23.14

Figure 3:  Integrate all of your internal and external data in the data lake

See my blog “How I’ve Learned To Stop Worrying And Love The Data Lake” for advice about how to leverage Hadoop to create a data lake. The data lake not only supports the integration of data across your operational applications, but also enables the integration of other internal data sources (consumer comments, email conversations, clinical studies, technician notes, prescriptions, web logs, etc.) with external data sources (social media, mobile, blogs, newsfeeds, third-party data, data.gov, etc.).

Embracing an Analytics (Data Science) Culture

But collecting the data isn’t enough. You also need a corporate culture that seeks to deploy data science within your key business functions; analytics integrated into your key business processes to uncover new insight into the “strategic nouns” of your business—your customers, products, partners, campaigns, stores, wind turbines, jet engines, ATMs, trucks, etc.

You need a modern architecture that supports your traditional data warehouse and business intelligence environment, while expanding your data and analytic assets to include advanced analytics and data science capabilities.

Bill blog 5 6.23.14

See my blog “Modernizing Your Data Warehouse Part 2” for more details about how to leverage Hadoop to modernize your data warehouse environment while adding a complementary, advanced analytics sandbox architecture.

Monetizing Customer, Product, and Operational Insights

In the end, the best way to achieve competitive differentiation and uncover new monetization opportunities lies in how you are delivering the insights that you gain from your data lake and advanced analytics environment (see Figure 4).

Bill blog 6 6.23.14

Figure 4: Analytics powering the Third Platform and the Internet of Things

The rise of the “Third Platform,” those pervasive smartphones and mobile tablets, are enabling organizations to deliver actionable insight to customers, partners and front-line employees alike. It’s enabling organizations to optimize key business processes and capitalize on new monetization opportunities. And for many leading organizations, it’s the culmination of IT becoming a strategic partner to the business. Instead of replicating existing business processes within your transactional systems, it enables IT to transform those key business processes and empower new business models.

For an example, see my blog “The Actionable Retail Manager Dashboard:  Next Generation BI,” which talks about how to integrate the insight gleaned from your advanced analytics system to create the next-generation dashboard—a dashboard that not only delivers business insight, but transforms the dashboard from a passive monitoring tool to a prescriptive recommendation engine to help empower front-line employees and management.

Summary of Best Practices

  1. Leave your operational systems in their silos. Don’t waste time and effort trying to integrate your disparate monolithic operational applications, except to ensure that they are using the same product, customer, store, and other master files.
  2. Integrate the data from your operational applications into a data lake that simplifies the integration problem (it’s easier to integrate data than applications). Focus your IT resources (people, time, and money) on those areas of data integration that create business differentiation, not just business parity.
  3. Augment the value of your operational data by adding new structured and unstructured data (both internal and external) to your data lake. And in the process, develop a corporate hunger for grabbing and integrating data into the data lake, even if you’re not yet sure how you might leverage that data.
  4. Finally, focus on:  building differentiated products; optimizing key business processes; monetizing key customer, product, and operational insights; delivering a more compelling, more engaging customer experience; and empowering front-line employees to make decisions that drive business value.


Duckie.me – Rubber ducking as a service

Rubber ducking is a nickname given to the process of brainstorming.. I can’t describe how may times.. I have tried giving technical explanations to nontechnical users.. And throughout that verbal process.. I always learned things I couldn’t have imagined.

The concept of ‘rubber ducking’ means.. That instead of WASTING THE TIME of your best customers.. It is more efficient to NOT waste the time of another person.. They didn’t really contribute anyways.

We are taught to describe our problems to an inanimate object.. In the hopes that we can still produce the valuable insights.. Merely by talking to an inanimate object.

I have used this thought process to help me understand many problems that SEEM insurmountable. Rubber ducking is what keeps me SANE.




Why can't I *SEARCH* for a particular Virtual Machine across all HOSTS??

I have hundreds of virtual machines.. Literally.. Hundreds.  I was looking to kick off a project today.. and I couldn’t for the LIFE of me.. find the VM that I’m looking for.  That is just SAD.

I honestly.. currently.. only have FIVE host machines..  I think that I found the right next almost-free machine for virtualization.

Being able to get up to 32gb RAM for just another $160.. that’s what I’ve bene waiting for ALL my life.  Getting it up to a couple of TB ALL SSD.. This sweet cheap server has NINE Sata Ports. And PLENTY of room for expansion.

I really need to get into an E5 sometime. BAD.

The last couple of PCs I bought.. out of Portland.. 5 HP Proliant ML150 G5 for 300 total.

Looking forward to spending more time on Docker. and Routing.. Proxies.


Docker 1.0 – The Biggest Disruption in Tech for 2014

I think that Docker is clearly the most important piece of software to come out in the past 20 years. I look forward to using Docker for automation, deployments, continuous integration.. Docker is a fascinating tool.. Really look forward to using it everywhere I go.



Remembering Reed Jacobson

Yes. I remember going to  sales meeting in Bellevue in July 2001… When I was in the restroom, I said something to him about how impressed I was with Analysis Services.

He ended up getting me a waiver for my first class at SqlSoft.com and I met with him once or twice after that.

He loved the Roast Beef and Chipotle from Briazz. Cheers Reed. Thanks for everything.

Aaron Kempf