Troubleshooting sshd

October 23, 2009

As per an earlier article on tunneling, I had setup some tunnels to allow secure access to my database behind the firewall. I haven’t however done much development locally in awhile, so I hadn’t used this tunnel. Just recently, I discovered that it wasn’t working anymore. Digging deeper I found that my public-key authentication was breaking somehow for a particular user.

Trying to debug things with “-v” wasn’t giving me useful information, so I thought I’m sure there must be something in the logs and luckily came across this post:

http://beerpla.net/2008/08/15/debugging-weird-sshd-connection-problems-what-happens-when-you-stop-sshd/

The long and short of it, was just like Artem, my issue was permissions of the home directory of the authenticating user. A quick fix to that and we were all set.

Thinking about it, now I remember I’d been having permissions issues within my virtual hosts, so I’d “chown -R”‘d them to the apache user. This ended up preventing me from being able to read the home directory(which happened to be the webroot) of the special user I’d created for tunneling and managing webfiles. Silly me.

The big lesson here: like most things on Unix – permissions should always be the first thing to check.


Getting Started with Upstart in Ubuntu

October 15, 2009

What’s Upstart

Ubuntu has had upstart installed as a replacement for init scripts since as far back as 2006, but it hasn’t yet been really used until the latest beta release of Karmic (Ubuntu 9.10).  Upstart is a more robust services management daemon that allows for things like dependencies, custom events/triggers, pre/post initialiation and resource limitations, amongst other things. You can go check out their home page at http://upstart.ubuntu.com.

I recently upgraded some servers to Karmic, and I decided to write a simple upstart script to start/stop my Django development server when I wanted.

The great thing about upstart is that it actually handles user configurable events, which is a super-powerful feature that I’m not really using yet, but it allows you to create chained initialization and shutdown processes.  Another great feature is the ability to run pre/post initialization tasks.  I’m using this in my example below to ensure the database is sync’d before starting up.

For my django server, I really just needed the bare minimum though of upstart’s features, and from my research into it so far, it looks like upstart is regularly changing, so putting too many directives in my config file might only cause problems later. Karmic is still not yet out of beta, as it is.

So, consider this a super-brief tutorial on how to use upstart for your own tasks, as a replacement for init scripts, or in addition to them (upstart doesn’t interfere with init scripts).

Installing/Using Upstart

Incidentally, if you haven’t got upstart on your system (you’ll know if the command initctl is missing). then you can install it using apt-get or yum depending on what system you are using. Upstart is supposedly defaulted in Fedora as well, but I’ve been Ubuntu-ized for a few years now so I haven’t played with my old friend Red Hat in a while.

Once you have upstart installed, one change you’ll notice right away is that upstart uses .conf files in the /etc/init directory as scripts instead of the ones in /etc/init.d.

You run these scripts by using the “start/stop/status” commands which are shortcuts to “initctl start“, “initctl stop” and “initctl status” accordingly. You can even list all services with “initctl list“, which gives you something more like a Windows Services list with statuses and PIDs.

In my example, I’ll be starting my django server with the command:

$ start django

and stopping it with:

$ stop django

The directives in your .conf files are called “stanzas”, and each type of stanza tells upstart what to do. If upstart doesn’t understand a stanza, it will behave as if the service doesn’t exist.

Pretty easy. So let’s take a look at the file.

Sample Upstart Script


# my upstart django script
# this script will start/stop my django development server
# optional stuff
description "start and stop the django development server"
version "1.0"
author "Jim Kass"


# configuration variables.
# You'll want to change thse as needed
env DJANGO_HOME=/home/django/myproject
env DJANGO_PORT=8000
env DJANGO_HOST=0.0.0.0 # bind to all interfaces

# tell upstart we're creating a daemon
# upstart manages PID creation for you.
expect fork

pre-start script
chdir $DJANGO_HOME
exec /usr/bin/python manage.py syncdb
emit django_starting

end script


script
# My startup script, plain old shell scripting here.
chdir $DJANGO_HOME
exec /usr/bin/python manage.py runserver $DJANGO_HOST:$DJANGO_PORT &
# create a custom event in case we want to chain later
emit django_running
end script

That’s it! You can actually use the shortcut stanza “exec” on a single line. chdir is actually also a stanza, but at this time it doesn’t support variable expansion, so I’ve instead used the script stanza. If you wanted to hard-code all your values and not use variables, your script could be even shorter.

See the wiki/docs here. These docs are for an older version of upstart, and there doesn’t appear to be an updated list, so I had to learn a few things from trial/error. For instance, console logged is not a valid stanza, but console output still is. Your mileage may of course vary.

For reference, the Upstart Stanza Wiki Page has a description of all the stanzas I used, and a lot more.

Upstart can do a LOT more than just start/stop things, you can chain scripts using custom events – you can even fire events manually from initctl (useful for testing things).

As mentioned, if you have directives in your script that upstart doesn’t understand, it will tell you “unknown job:” when you try to run it. As upstart is pretty new and changing frequently, not all “stanza” directives work on all versions. Check docs for your version as needed.

Hope that helps!


My Television Princess

September 29, 2009

I’m always being asked by Christal to “blog” about her – and as this is mostly a tech blog for me, partially to help me with my own memory issues, and partially to share some of my insights about the tech world with others… I don’t have much opportunity to do that in other places, at least not without sounding too self-interested.

I essentially “left” the entertainment business some 5 or 6 times since I moved here in 1998.  That’s nearly once every other year.  More recently, I finally conceded that digital is more than just another great media format, but is finally at the point of being a viable means of distribution, and the convergence of entertainment and the internet is in process.  I jumped into the digital space, not as a content producer though as I thought I would have, but as a technologist, and a digital enabler… hoping to  leverage the new digital pipelines for distribution and production.

To that end, I owe my renewed love, appreciation and respect of entertainment (and particularly television which is guiding it’s current path) to my lovely Christal.  As a guest at many major television events, including the recent Emmy awards, I’ve become increasingly more comfortable amongst an ever growing pool of the most talented, funny and sexiest people in the world.  The difference between film and television might be the size of paychecks, or the public perception of fame, but in truth – entertainment truly begins and ends with broadcast.

I owe a great debt of gratitude to my television princess, who I continue to love and admire, and who provides me with so much inspiration to want to be a leader in this constantly evolving digital space.

Thank you again!


The New Analytics

September 29, 2009

Since Google acquired Urchin eventually building Google Analytics, there has not been a really nice web-analytics product that was either free or extensible for web developers.  We may start out using something like Webalizer, but it’s lack of any actual data-store makes it an ugly resource.

Recently, I came across two products that really excited me, as they both contained powerful solutions for analyzing data on your site, and in many cases they are cluster friendly, making it possible to deploy multiple “daemons” and then remotely download logs for analysis on a secondary server.

Piwik

First there is Piwik.  I came across this as it appears it is now an added value from sourceforge (along with a private Trac setup).  Piwik in their own words, “aims to be an open source alternative to Google Analytics“.

A quick browse through their site and demo reveals that they have an API for writing your own custom reporting modules, and a relatively clean interface that uses ajax extensively.  Large queries seem to be slow, and I’m guessing this is because it’s all tied to MySQL.  I can only imagine if someone wanted to re-write this product to use CouchDB how amazingly fast and powerful it could be, but that’s just another topic for another day.

I haven’t yet installed this product, as I’m pretty satisfied with what I get from Ganalytics (aside from basically giving all my data to Google), and until I have a properly clustered and load-balanced environment, I really can’t afford to have any more processor intensive activity than necessary.

Nonetheless, it’s good to know that there’s something like this out there.  If it could be modified and/or if there is already a branch that allows multiple sites across mutiple IPs to write to a common database located elsewhere (or via an SSH Tunnel), it could be an ideal white-labelled/free tool to offer clients in lieu of Google Analytics.

Splunk

Not long ago, while using my favorite software and systems administration support service, StackOverflow and ServerFault, I clicked on an ad for Splunk.  As it’s name implies, it’s a tool for doing deep dives into your log files.  Splunk is a LOT more than that, and it’s designed to be run on clustered systems, so multiple splunk instances can grab data from their respective servers then feed results to a central hub where more detailed analysis can take place.

Splunk is a powerful but memory intensive application that can do powerful searching across various documents and has an API that allows you to build completely custom applications with their own MVC architecture, that are then added to your dashboard.  The help and initial tutorial application is one of these – basically a series of informational pages and content.

There is also a sample web analytics application that provides a completely custom interface for analyzing the data.  It’s all very cool, though maybe a bit much until you have a fully clustered system and can afford the memory tradeoff.

There’s a free and a paid version, and clustering is only supported in the paid.  It’s definitely worth exploring, but expect to spend a few hours just trying to understand the paradigm before you can do anything useful.  The main feature to note is that all datasets are generated as a result of some kind of grep/search, and you can dive as deep as you want into those datasets.

Summary

It’s a really exciting time right now to be in the web-space.  With so much of computing power moving to the cloud, and open-source API driven solutions flooding the marketplace, even the smallest web projects are becoming more sophisticated and scalable.  Much like the early days of 1.0 web, it’s quite possible to build an entirely new business model out of your garage, and grow it.  Enterprise companies which have built multi-billion dollar franchises are competing against these small mom-and-pop shops, and it all makes for a very competitive and constantly evolving technology industry.


YouTube Player Bug?

July 30, 2009
UPDATE:

Confirmed that this is a site-wide bug.  There are very few reports, but I found a few on twitter and in the bug reports section on YouTube.
Supposedly YouTube has been made aware of the problem.
UPDATE AGAIN:
As if 2:45pm, this now appears to be fixed.
—-

I’ve noticed in a recent implementation of the YouTube player API, that sometime perhaps as recently as today, the poster frames stopped appearing in videos.  I’ve seen this happen in the YouTube embedded player as well, and after doing a short dig into the issue with Firebug, it seems the culprit is that the player is requesting an image at “yimg.com/VIDEOID/hqdefault.jpg/hqdefault.jpg”.  Of course that URL is wrong, “yimg.com/VIDEOID/hqdefault.jpg” is what is needed.

I’m wondering if I’m seeing this bug because there was a recent player update that went wrong, and I just need to clear my cache or what.  These things happened at Revver from time to time… changes to the player would sometimes break the existing functionality, and a simple clearing of the cache would fix (which happens most times when people restart their computers).

For now, I’m going to wait.  I did post a note on the YouTube API users group (UPDATE: which never appeared), so we’ll see what transpires.  Also, I don’t really use YouTube that much (until recently when I started developing using their API for a client), but perhaps it’s as simple as they are going to default all poster images to the HQDefault, and there’s a typo in the new code that got released somewhere.

Odd, cause I’d expect an issue as simple as this to not pass QA.  Anyway, could just be something weird on my end – I’ll restart and see what happens tomorrow.

Here’s my example video:

Here’s what I see:

YouTube player missing poster frame

YouTube player missing poster frame

And here’s what I get when looking in Firebug at the requests:

Firebug Net output

Firebug Net output


SSH Tunneling and MySQL Remote Administration

June 29, 2009

I’ve got a few cloud servers each one of which has both an internal and external ip address.  One server has a webserver, the other has a mysql server.

The webserver connects to the mysql server on it’s internal ip, and in my.cnf the mysql server is bound to it’s internal ip accordingly.  The separation is done for two reasons, performance and cost.  The MySQL database is located on a different node, so if there are issues with the webserver or some kind of DDOS attack against it, the database server is unaffected and vice-versa.  I get all internal bandwidth free but I have to pay for all external bandwidth, so if I were to expose the server on a remote ip, then I’d be paying for all data-transfer between the webserver and mysql server.  Before I started this project, I was doing just that, and I realized how wasteful it was.

Furthermore, using this approach, I am able to better lock-down my servers, effectively creating a local network of my servers behind a firewall.  My website ports such as port 80 or 443 can be allowed to be accessed by outside the firewall, but all other traffic would require an SSH tunnel via a specific host.  This is basically like creating a proxy server, but even more secure as I only allow the intended access at the exact moment that i need it.

I’ve already implemented ssh keys so the only way to access this on-demand proxy server would be to do so on authorized machine, which adds yet another layer of security.

I want to be able to remotely administrate the mysql server without having to rely on ssh’ing directly to the box and running command line tools.  For instance, I’d like to use MySQL Administrator on a Mac or PC.

Because the mysql server is bound to it’s internal ip address and not it’s external (specifically because internal bandwidth is free), I can only access it by creating an SSH tunnel.

Also, I run my SSH on a custom port for added security.

On the server hosting the mysql server, I have a rule like this:


-A INPUT -d {MY_INTERNAL_IP} -p tcp -m tcp --dport 3306 -j ACCEPT

This is telling my firewall to only only direct requests to the internal ip address for port 3306 (the default mysql port).

On the webserver, I have a similar rule:


-A OUTPUT -p tcp -m tcp --dport 3306 -j DNAT --to-destination 10.176.67.75

Now that I have the firewall setup, I can create my ssh tunnel.  I’m doing this using a command line script for simplicity, but on OS X it would be very easy to create a simple “script” that would operate much like a “connect VPN” process.

After researching online a bit, I discovered that to tunnel via SSH, you used the following command (the second part of this command just tests the db connection to make sure it worked):


# ssh -f -p{MY_SSH_PORT} -L {LOCAL_IP}:{LOCAL_PORT}:{MY_INTERNAL_IP}:{REMOTE_PORT} user@{MY_EXTERNAL_IP} sleep 60 && mysql -h {LOCAL_IP}

What this does is tell ssh to create a remote SSH session to MY_EXTERNAL_IP on MY_SSH_PORT and then forward all requests to REMOTE_PORT to MY_INTERNAL_IP and create a binding to the LOCAL_IP.

What’s happening behind the scenes is that you are routing all requests to LOCAL_IP on LOCAL_PORT to REMOTE_PORT on MY_EXTERNAL_IP which in turn forwards all traffic to REMOTE_PORT on MY_INTERNAL_IP.  Going the reverse direction any communication from MY_INTERNAL_IP gets routed straight back to the LOCAL_IP.  This is a very simplified description of an SSH Tunnel.  In essence you’ve created a tunnel through which all requets are able to be securely encrypted and sent to a specific server that is sitting behind a firewall.

Note, not just anyone can create such a tunnel.  You have to be able to access the box via SSH.  As I mentioned before, I’ve locked-down the system using public/private keys.  If you have not done that, then anyone that has a vaild account on the server and can connect via ssh can actually create an ssh tunnel like this.

i.e.

# ssh -f -p12345 -L 127.0.0.1:3306:10.0.0.2:3306 dba@176.148.12.20 sleep 60 && mysql -h 127.0.0.1

For simplicity, I choose 127.0.0.1 but that has the unintended side-effect of making MySQL Administrator (the gui tool) think I’m connecting to a local box and it confuses it sees that the server is off.  I’d prefer not to risk some fringe use case like this causing disasterous bugs when I update a table, etc.

So, I manually added a network interface.  This was easy to accomplish in OS X by accessing the Network Settings and then adding a new ethernet service. I named it “SSH Tunnel” and gave it an internal ip of 192.168.0.10 which was a valid but unused address on my subnet (but not one that would have any conflicts with the DHCP server on my router).

New IP Setup for SSH Tunneling

New IP Setup for SSH Tunneling

I actually didn’t give any DNS or GATEWAY information, as it was not relevant.

So now I have a new ip available on my local subnet, and I used that as my LOCAL_IP.  Despite the fact that the ip is bound to a fictional ethernet interface on my local computer, MySQL Administrator thinks it’s a remote server, so the issue is solved.

The final command:


# ssh -f -p12345 -L 192.168.0.10:3306:10.0.0.2:3306 dba@176.148.12.20 sleep 60 && mysql -h 192.168.0.10

All in all, it took me about a days worth of research but less than an hour to implement everything.  Now i have secure access remotely to administrate my database behind a firewall.  To the outside world, my database server is entirely invisible.  Behind that firewall, I can access my database without risk of being charged for bandwidth.


Google’s Reseach Library

June 9, 2009

Did you know that Google maintains a library of research papers written by Google employees? The library contains information that reveals some of the very cool things that are happening at Google on a regular basis.

http://research.google.com/pubs/papers_by_year.html

I came across this when I was researching “Big Table”, but there are papers on everything here – including facial recognition software for online videos, and search engine methodology. Some of the links go to the google search index, and others to the research database and/or directly to articles in PDF format.

If you want to remain on the cutting edge, checking this page often would be highly desirable I’m sure.


Trying TextMate

June 9, 2009

I’ve started using a new editor, “TextMate”. I’m sure many developers are already familiar with it on OS X.

I’m interested in comparing it with similar power editors such as Komodo Edit, and so far, one very interesting feature is the “Bundles”. These Bundles act as miniature application stacks within the primary editor, and they provide the ability for templates as well as more powerful features such as actual blogging.

I am presently writing this blog from within TextMate, and I can publish it directly to my blog via XMLRPC. That’s a pretty powerful feature, especially for writing technical blogs.

Anyway, there are many other ways to approach writing blogs – including using Microsoft Office (oh, wait – not if you have a Mac), but for simple text-only entries, with no frills, this seems to be pretty useful.

TextMate actually costs money, whereas Komodo Edit is free, so it’s still a tough sell, but for the next 30 days, I’ll give it a full test and post any interesting results here.


Microsoft (Worse Than Failure) Live

June 9, 2009

Yet another reason why I think Microsoft fails.

I recently wanted to use Microsoft’s Office Live product, instead of Google Docs, but despite the fact that it says it’s compatible with OSX, apparently my version of Firefox on Leopard is unsupported. Sorry, Microsoft, guess I won’t be using your product after all.

officeliveonmac

What a waste. Cause to be frank, Google Doc’s spreadsheet product kinda sucks.

Oh well.


Connecting With Facebook. Why is this so difficult?

May 18, 2009

I installed a WordPress.com developed facebook application today so that I could update my facebook friends on posts I’ve made here.  But, apparently, this is rocket science, as even after installing the application, and waiting nearly 5 solid minutes for it to load, there’s nothing in the interface that actually clarifies how it works, or how to perform the mind-blowingly simple task of adding a link and title of a new blog post to my facebook feed.  This is exactly what Facebook Connect is supposed to accomplish, and it seems to work rather easily with the other companies out there.  I was able to setup Netflix in no time at all, so you can all start to share in the joy of either reading my reviews on movies, or otherwise seeing the ratings I have given.

So, hopefully if all goes well, this post will make it’s way onto my facebook page somehow.  If not, then if anyone out there can recommend a proper solution to this problem, I’m all ears.