Monday, September 30, 2013

My Top 5 Companies

One of the most valuable pieces of career advice I've ever received was from a friend who told me that I should get a list of the Top 5 Companies that I want to work for.

I thought it was a little silly at the time. After all, that wasn't how the job market worked, was it? There were ads in the classified section or on job boards. You searched for the position you wanted, not the company itself.

But then he said something very profound: If you choose a company because of a job, and that job changes or goes away, your entire reason for being in the company is gone. If you choose a job because of a company, you'll want to grow with the company and to stay at the company longer.

It took several years before I realized the true meaning of those two sentences. So, I took his advice and started to think about the companies that I'd long admired. I looked back at my life, my interests and passions and thought about the companies that I could bring those passions to on a daily basis. The results have changed over the years, of course, and the number '5' has always been more of a guideline rather than a hard limit.  Without further delay....

Microsoft

This should come as no surprise to anyone who's ever met me. I've always been an early adopter of Microsoft technology, whether it was an operating system, game console or telephone. While some people think of Microsoft as the "Evil Empire", I see what the company was (and is) trying to accomplish by integrating technology into everyday life. If anything, I would like them to take a cue from Google and get back to their innovative roots.

I had the opportunity to visit the Redmond campus back in 2001 for a pre-release retail training for Windows XP. The moment I walked through the front doors until long after I'd left that day, all I could think was "This is where I want to be." Microsoft has the ability to be a game changer in whatever sector is decides to be in. I want to be part of it

The Walt Disney Company

Again, this should come as no shock. Disney, along with Microsoft, is one of the two companies to have been on this list since the very beginning. 

I first visited Disney World when I was two years old, and have probably walked through the main gates over 100 times. That's not bad for someone who's only lived in Florida for the past four years. We would take the yearly 1000+ mile pilgrimages riding in the back seat of the car from Montreal or Pennsylvania to someplace in the middle of nowhere, Florida. I would get engaged there, and married there. Disney made my spirit whole again after the events of 9/11. Disney was more than a couple of theme parks and some movies.

 Disney has always been a place of Magic. I've seen kids of all ages light up when they see Mickey. Nevermind that that mouse is, in all likelyhood, a young lady in a suit. It's what Mickey Mouse stands for that makes all the difference. It was Walt's dream of a better tomorrow. It was his dream of E.P.C.O.T. (Experimental Prototype Community Of Tomorrow). It is a list of achievements and dreams far too many to list. It's something bigger.

Tesla / SpaceX

Elon Musk - enough said?

I want..no, I NEED to be part of a vision, something that is bigger than myself. I can think of no bigger visionary in the world than Elon Musk. 

Couple the leadership factor with the absolutely amazingly cool toys I'd get to work with at either Tesla or SpaceX and it's easy to see why this would be a dream job.

Amazon

Everyone knows about the book store and the Kindle. But did you know that Amazon Web Services has a higher market share of the cloud server market than Microsoft? Amazon Prime has been making headway in allowing customers to get their TV programming without having cable (something near and dear to my heart). Amazon Fresh is providing grocery store delivery in certain markets. 

Amazon has been on the cutting edge of consumer innovation for its entire history. It's a great example of  a company that is looking at the problems facing the average consumer and asking "How can we fix this?" 


Target

Yes, Target. Not in the stores, mind you. I've already done that. I want to be in Analytics. 

Target made news with their analytics team last year when they determined (correctly) that a young, teenage girl was pregnant before her family did. The company had to make some adjustments and to change some of its policies after that incident so that that sort of delicate situation wouldn't occur again. But the cat was out of the bag. The possibilities for 'Big Data' were out there for the world to see. Some people were appalled at the invasion of privacy. 

I was excited. If a retail company, something I knew a lot about, could predict consumer buying patterns that well, the possibilities were endless. Never again could a company fail to understand its customer. Companies could target ads and products directly to their consumer. THEIR consumer - not just any consumer that walked through the door. Here was an intellectual solution to a problem that I'd dealt with on a daily basis at Circuit City. That one news report was instrumental in shaping my future career choice. 


OK, so this is more than five. But there are more that are very high on my list as well: IBM, Electronic Arts, Eidos, BMW, and Lotus Engineering.

Next post, I'll return to Part 2 of my Journey post: My Long List of Graduate Schools

Wednesday, September 25, 2013

Thoughts on the Next Stage of the Journey

This will be my first post which really points to where I thought the blog would primarily be focused. It's about the future - specifically mine as it relates to school and career options.

I'm (finally) graduating in 79 days, not that I'm counting down or anything. I will be getting a Bachelor's of Science in Business Administration from the University of Florida - Go Gators! After so many stops and starts in my college career, having taken classes in more schools than I'd care to admit, I'm finally at the end. I've succeeded beyond my most optimistic predictions, and it's time for the reward.

I'm looking forward to walking across that stage and receiving my empty diploma case (seriously? What's up with that?). I'm ready to start the next stage of the journey.

More classes.

Meet the new boss, same as the old boss.

After so many, many, many years trying to get one degree, why would I immediately want to get another?

I actually like school. I never would have believed those words would come out of my mouth until I got to the University of Florida. For so long I've had absolutely crappy professors who couldn't care less about the students they were churning through their poorly thought out classes. Sure, I've had some incredible ones, but I could really count them on one hand before UF and still have fingers left over. What's worse is that all of them were back in high school.

Florida is different though. Not all of the professors are amazing. There have been a couple stinkers in the program. But so many were the kind of "Dead Poet's Society" professors that really bring out the best in you. They challenge you to do more than you thought you could. There's also a great support network of students who really want to succeed. And that's not an environment I ever want lose.

So I've determined that I'm going to apply to graduate schools. I'll take a breath after graduation. Go on a cruise and enjoy Christmas. Find a better job because now I can check that box that says "Degree?". But then it's time to study for the GRE or possibly the GMAT.

Now I just have to pick a program.

79 days. #GatorGrad2013

Tuesday, September 24, 2013

Batman: Arkham Origins

http://blog.us.playstation.com/2013/09/24/batman-arkham-origins-playstation-exclusive-knightfall-pack-detailed/


Seeing Adam West jumping around, being all acrobatic is just a little weird. What's next, Burt Ward in his little green undies?

Don't know how I feel about the Joker not being voiced by Mark Hamill.

That being said, Jean Paul's suit looks totally badass!


VMG7AA33CM3A 

Wednesday, September 18, 2013

Microsoft's Answer to Siri: Cortana

Cortana is Microsoft's Answer to Siri



Microsoft has announced that is is developing a virtual assistant to rival Apple's Siri. The code name? Cortana.

Microsoft has already planned an upgrade to Windows Phone and with the Xbox One in the works, this could be a great deal! With Siri being extremely popular, but not necessarily extremely smart, having a "virtual intelligence" of its own could be a boon for the struggling phone software.

Steve Ballmer had this to say:
"Our UI will be deeply personalized, based on the advanced, almost magical, intelligence in our cloud that learns more and more over time about people and the world. Our shell will natively support all of our essential services, and will be great at responding seamlessly to what people ask for, and even anticipating what they need before they ask for it."

Imagine if that UI had the same voice and personality as the Master Chief's sexy sidekick. Sales from the geek segment would probably double the number of Windows Phones on the market by themselves.

All jokes aside....I'm pretty excited, and I don't even like Halo. 

Wednesday, September 11, 2013

Analytics and Data Scientist Interview Questions

As I finish up my undergraduate degree in Business from the University of Florida (Go Gators!) and begin the preparation for a graduate degree in Information Technology, I'm starting to look at what is involved in becoming a data analyst. It's my hope that I can narrow down the myriad of options available and find something where I can really make an impact.

I stumbled upon this list of interview questions for senior level data scientists at Data Science Central. Some of these questions are things that we're going over right now in my Operations Management class at UF. It's interesting how detailed the questions are for a Director position.

We are now at 86 questions. These are mostly open-ended questions, to assess the technical horizontal knowledge of a senior candidate for a rather high level position, e.g. director.
  1. What is the biggest data set that you processed, and how did you process it, what were the results?
  2. Tell me two success stories about your analytic or computer science projects? How was lift (or success) measured?
  3. What is: lift, KPI, robustness, model fitting, design of experiments, 80/20 rule?
  4. What is: collaborative filtering, n-grams, map reduce, cosine distance?
  5. How to optimize a web crawler to run much faster, extract better information, and better summarize data to produce cleaner databases?
  6. How would you come up with a solution to identify plagiarism?
  7. How to detect individual paid accounts shared by multiple users?
  8. Should click data be handled in real time? Why? In which contexts?
  9. What is better: good data or good models? And how do you define "good"? Is there a universal good model? Are there any models that are definitely not so good?
  10. What is probabilistic merging (AKA fuzzy merging)? Is it easier to handle with SQL or other languages? Which languages would you choose for semi-structured text data reconciliation? 
  11. How do you handle missing data? What imputation techniques do you recommend?
  12. What is your favorite programming language / vendor? why?
  13. Tell me 3 things positive and 3 things negative about your favorite statistical software.
  14. Compare SAS, R, Python, Perl
  15. What is the curse of big data?
  16. Have you been involved in database design and data modeling?
  17. Have you been involved in dashboard creation and metric selection? What do you think about Birt?
  18. What features of Teradata do you like?
  19. You are about to send one million email (marketing campaign). How do you optimze delivery? How do you optimize response? Can you optimize both separately? (answer: not really)
  20. Toad or Brio or any other similar clients are quite inefficient to query Oracle databases. Why? How would you do to increase speed by a factor 10, and be able to handle far bigger outputs? 
  21. How would you turn unstructured data into structured data? Is it really necessary? Is it OK to store data as flat text files rather than in an SQL-powered RDBMS?
  22. What are hash table collisions? How is it avoided? How frequently does it happen?
  23. How to make sure a mapreduce application has good load balance? What is load balance?
  24. Examples where mapreduce does not work? Examples where it works very well? What are the security issues involved with the cloud? What do you think of EMC's solution offering an hybrid approach - both internal and external cloud - to mitigate the risks and offer other advantages (which ones)?
  25. Is it better to have 100 small hash tables or one big hash table, in memory, in terms of access speed (assuming both fit within RAM)? What do you think about in-database analytics?
  26. Why is naive Bayes so bad? How would you improve a spam detection algorithm that uses naive Bayes?
  27. Have you been working with white lists? Positive rules? (In the context of fraud or spam detection)
  28. What is star schema? Lookup tables? 
  29. Can you perform logistic regression with Excel? (yes) How? (use linest on log-transformed data)? Would the result be good? (Excel has numerical issues, but it's very interactive)
  30. Have you optimized code or algorithms for speed: in SQL, Perl, C++, Python etc. How, and by how much?
  31. Is it better to spend 5 days developing a 90% accurate solution, or 10 days for 100% accuracy? Depends on the context?
  32. Define: quality assurance, six sigma, design of experiments. Give examples of good and bad designs of experiments.
  33. What are the drawbacks of general linear model? Are you familiar with alternatives (Lasso, ridge regression, boosted trees)?
  34. Do you think 50 small decision trees are better than a large one? Why?
  35. Is actuarial science not a branch of statistics (survival analysis)? If not, how so?
  36. Give examples of data that does not have a Gaussian distribution, nor log-normal. Give examples of data that has a very chaotic distribution?
  37. Why is mean square error a bad measure of model performance? What would you suggest instead?
  38. How can you prove that one improvement you've brought to an algorithm is really an improvement over not doing anything? Are you familiar with A/B testing?
  39. What is sensitivity analysis? Is it better to have low sensitivity (that is, great robustness) and low predictive power, or the other way around? How to perform good cross-validation? What do you think about the idea of injecting noise in your data set to test the sensitivity of your models?
  40. Compare logistic regression w. decision trees, neural networks. How have these technologies been vastly improved over the last 15 years?
  41. Do you know / used data reduction techniques other than PCA? What do you think of step-wise regression? What kind of step-wise techniques are you familiar with? When is full data better than reduced data or sample?
  42. How would you build non parametric confidence intervals, e.g. for scores? (see the AnalyticBridge theorem)
  43. Are you familiar either with extreme value theory, monte carlo simulations or mathematical statistics (or anything else) to correctly estimate the chance of a very rare event?
  44. What is root cause analysis? How to identify a cause vs. a correlation? Give examples.
  45. How would you define and measure the predictive power of a metric?
  46. How to detect the best rule set for a fraud detection scoring technology? How do you deal with rule redundancy, rule discovery, and the combinatorial nature of the problem (for finding optimum rule set - the one with best predictive power)? Can an approximate solution to the rule set problem be OK? How would you find an OK approximate solution? How would you decide it is good enough and stop looking for a better one?
  47. How to create a keyword taxonomy?
  48. What is a Botnet? How can it be detected?
  49. Any experience with using API's? Programming API's? Google or Amazon API's? AaaS (Analytics as a service)?
  50. When is it better to write your own code than using a data science software package?
  51. Which tools do you use for visualization? What do you think of Tableau? R? SAS? (for graphs). How to efficiently represent 5 dimension in a chart (or in a video)?
  52. What is POC (proof of concept)?
  53. What types of clients have you been working with: internal, external, sales / finance / marketing / IT people? Consulting experience? Dealing with vendors, including vendor selection and testing?
  54. Are you familiar with software life cycle? With IT project life cycle - from gathering requests to maintenance? 
  55. What is a cron job? 
  56. Are you a lone coder? A production guy (developer)? Or a designer (architect)?
  57. Is it better to have too many false positives, or too many false negatives?
  58. Are you familiar with pricing optimization, price elasticity, inventory management, competitive intelligence? Give examples. 
  59. How does Zillow's algorithm work? (to estimate the value of any home in US)
  60. How to detect bogus reviews, or bogus Facebook accounts used for bad purposes?
  61. How would you create a new anonymous digital currency?
  62. Have you ever thought about creating a startup? Around which idea / concept?
  63. Do you think that typed login / password will disappear? How could they be replaced?
  64. Have you used time series models? Cross-correlations with time lags? Correlograms? Spectral analysis? Signal processing and filtering techniques? In which context?
  65. Which data scientists do you admire most? which startups?
  66. How did you become interested in data science?
  67. What is an efficiency curve? What are its drawbacks, and how can they be overcome?
  68. What is a recommendation engine? How does it work?
  69. What is an exact test? How and when can simulations help us when we do not use an exact test?
  70. What do you think makes a good data scientist?
  71. Do you think data science is an art or a science?
  72. What is the computational complexity of a good, fast clustering algorithm? What is a good clustering algorithm? How do you determine the number of clusters? How would you perform clustering on one million unique keywords, assuming you have 10 million data points - each one consisting of two keywords, and a metric measuring how similar these two keywords are? How would you create this 10 million data points table in the first place?
  73. Give a few examples of "best practices" in data science.
  74. What could make a chart misleading, difficult to read or interpret? What features should a useful chart have?
  75. Do you know a few "rules of thumb" used in statistical or computer science? Or in business analytics?
  76. What are your top 5 predictions for the next 20 years?
  77. How do you immediately know when statistics published in an article (e.g. newspaper) are either wrong or presented to support the author's point of view, rather than correct, comprehensive factual information on a specific subject? For instance, what do you think about the official monthly unemployment statistics regularly discussed in the press? What could make them more accurate?
  78. Testing your analytic intuition: look at these three charts. Two of them exhibit patterns. Which ones? Do you know that these charts are called scatter-plots? Are there other ways to visually represent this type of data?
  79. You design a robust non-parametric statistic (metric) to replace correlation or R square, that (1) is independent of sample size, (2) always between -1 and +1, and (3) based on rank statistics. How do you normalize for sample size? Write an algorithm that computes all permutations of n elements. How do you sample permutations (that is, generate tons of random permutations) when n is large, to estimate the asymptotic distribution for your newly created metric? You may use this asymptotic distribution for normalizing your metric. Do you think that an exact theoretical distribution might exist, and therefore, we should find it, and use it rather than wasting our time trying to estimate the asymptotic distribution using simulations? 
  80. More difficult, technical question related to previous one. There is an obvious one-to-one correspondence between permutations of n elements and integers between 1 and n! Design an algorithm that encodes an integer less than n! as a permutation of n elements. What would be the reverse algorithm, used to decode a permutation and transform it back into a number? Hint: An intermediate step is to use the factorial number system representation of an integer. Feel free to check this reference online to answer the question. Even better, feel free to browse the web to find the full answer to the question (this will test the candidate's ability to quickly search online and find a solution to a problem without spending hours reinventing the wheel).  
  81. How many "useful" votes will a Yelp review receive? My answer: Eliminate bogus accounts (read this article), or competitor reviews (how to detect them: use taxonomy to classify users, and location - two Italian restaurants in same Zip code could badmouth each other and write great comments for themselves). Detect fake likes: some companies (e.g. FanMeNow.com) will charge you to produce fake accounts and fake likes. Eliminate prolific users who like everything, those who hate everything. Have a blacklist of keywords to filter fake reviews. See if IP address or IP block of reviewer is in a blacklist such as "Stop Forum Spam". Create honeypot to catch fraudsters.  Also watch out for disgruntled employees badmouthing their former employer. Watch out for 2 or 3 similar comments posted the same day by 3 users regarding a company that receives very few reviews. Is it a brand new company? Add more weight to trusted users (create a category of trusted users).  Flag all reviews that are identical (or nearly identical) and come from same IP address or same user. Create a metric to measure distance between two pieces of text (reviews). Create a review or reviewer taxonomy. Use hidden decision trees to rate or score review and reviewers.
  82. What did you do today? Or what did you do this week / last week?
  83. What/when is the latest data mining book / article you read? What/when is the latest data mining conference / webinar / class / workshop / training you attended? What/when is the most recent programming skill that you acquired?
  84. What are your favorite data science websites? Who do you admire most in the data science community, and why? Which company do you admire most?
  85. What/when/where is the last data science blog post you wrote? 
  86. In your opinion, what is data science? Machine learning? Data mining?
  87. Who are the best people you recruited and where are they today?
  88. Can you estimate and forecast sales for any book, based on Amazon public data? Hint: read this article.
  89. What's wrong with this picture?

Tuesday, September 10, 2013

I Expect More

I expect more.

I expect more out of myself. More out of others. Certainly more out of the people and companies I am giving money to.

Does that make me a "hater?" No. I believe it just makes me difficult to impress.

When did that become a bad thing?

As consumers in this country, we've become too complacent. We've essentially become lemmings chasing each other off the cliff of consumerism. We're all so locked into what the new version of Product X (or 'i') is that we've lost sight of what really is innovative. The American Dream is not about who can build a better product anymore. It's about who can wrap the same old stuff in a fancy new colored plastic shell. The sad part is that the American public who lost sight of their American Dream is only too happy to stand in line and buy it.

I've mentioned before that I don't understand the concept of waiting in line for the privilege of buying something first. My time is more valuable than that. I should hope that anyone old enough to have a job or family could find better things to do as well.

What really is confusing to me is how consumers will flock to purchase "innovative" products that do nothing than re-hash old ideas. Even worse, are the products that show off their technical specs like they are the best thing since sliced bread.

Case in point: This slide from the Apple 5s launch.


What does this mean to the average consumer?

Nothing.

When was the last time someone asked you how many transistors or floating point registers your phone had? Don't you wish your phone had a "modern instruction set?" Have you heard the latest pick up line: "Hey baby, what's your die size?" No? Neither has anyone else on the planet.

Disruptive Technology

Think back to the first time you saw the Mac G4. Or iTunes. Or the iPod. Or the iPhone. Or the iPad. Those products were beatifully designed, simple and intuitive to operate, and, most importantly, disruptive.

What is disruptive technology? Its something that you never knew you couldn't live without. All of those products changed the way we went about our day. They changed the way we worked and played. They may not have been the first of their type on the market. Certainly there were mp3 players before the iPod. There were smartphones before the iPhone. But they made the technology more accessible to the masses. And yes, they even made us a little cooler.

What was Apple's last disruptive product? It wasn't the iPhone 5. You could say Siri, but they aquired that technology by buying another company. Even CNN noted that Apple is having difficulty innovating. Wired wrote that the reputation for innovation is Apple's greatest liability. Apple entering the market in China is a great story, but its a business story. Apple is a tech company, and to keep the lead in technology you have to keep the geeks happy. Ignore them (us), or let your product grow stale and you risk losing it all. See: Microsoft, Sony, Nokia, Motorola, Kodak, Polaroid, etc.

Stagnation

The important point I'm trying to make is that Apple isn't alone is growing stale. Many businesses, across all sectors have slowed down and allowed their competition to pass them by. But, unlike most other companies, Apple hasn't been criticized as much as, say Microsoft has for not leading the way. It seems Apple has a loyal following (fanboys) to shield them from harm.

The trouble with a rabid fan base is that it generally grows up at some point. I don't mean that in a sense of maturity, but rather their lives cease to revolve around a product or company. They get families, or full-time jobs, or whatever other responsibilities life throws their way. To illustrate my point, look at the average age of the customers waiting in line at the Apple store in NYC. I don't see a single person that looks out of college. Sooner or later, without attracting new customers, the rabid user base will die off.


Was the iPhone 5s / 5c Really Worth 90 Minutes of My Life?

Was it just me, or were you watching the announcement just waiting for the "Oh, here's one more thing..." announcement. Google did it with the Chromecast recently. That was a great Steve Jobs-esque display of showmanship. Today, instead of the iWatch, the iTV or any other iThing we got Elvis - and not the cool Elvis either. 

So to recap, here's what we got:
  1. Plastic iPhone 5c in various pastel colours
  2. The Gold, Silver or Black-but-not-really-black iPhone 5s
  3. the iPhone 5s has a dual core processor that's 64 bit now (whee!!!!)
  4. Fingerprint sensor
  5. A better camera that doesn't have more megapixels, it has BIGGER pixels
  6. Elvis. Costello, that is. 
Color me unimpressed.