(it's not the colors that matter) [entries|archive|friends|userinfo]
Ben Karel

[ website | eschew ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Moved (back) to WordPress [Mar. 11th, 2009|04:00 pm]
Since LiveJournal has been showing increasingly flashy and annoying ads, I'm moving this thing back to a hosted blog service. WordPress has a pretty nice LiveJournal importer, so I'm going to give wordpress.com a try. Whatever service I'm using will be linked to from http://eschew.org/blog/.
linkpost comment

An Unexpected Side Effect of Combining C++ Features [Mar. 1st, 2009|12:40 am]
So, a pop quiz. Suppose you write a C++ program using class B provided in a library, like so:

#include "some-header.h"
int main() {
  B* p = new B();
  return 0;
}


The snippet above compiles and links just fine. So the snippet below should build too, right? Mind you, the base class is not doing anything sneaky. In particular, it does not have a private destructor, or anything else designed to interfere with derivation.


#include "some-header.h"
class D : public B {};
int main() {
  D* p = new D();
  return 0;
}


The answer to the quiz is that this may or may not build. It will compile, but you could get unresolved symbol errors during linking. How and when, you ask? It happens when you're linking against a dynamically linked library (at least on Windows) and B has a protected virtual method not exported in the DLL. The explanation (as far as I can reason) hinges on the fact that the .lib for a DLL contains only the symbols marked for export. When you're using the class directly, your compilation unit can't access the protected virtual symbol, so the linker doesn't look for it. But when you use the derived class you just defined, well, D can now access that protected virtual method in B, so the linker needs to find the method's symbol. Since the symbol is not there, boom.

This situation involves no less than four distinct features coming together for the express purpose of saddening you: separate compilation, method access, inheritance, and custom symbol visibility.
linkpost comment

Updated Reddit Highlighter Greasemonkey Script [Feb. 8th, 2009|09:45 pm]
Quick PSA: I've updated the Reddit Highligher Script to work with the changes Reddit made to their site structure a while back. I've also added highlighting stories based on linked domain name or link title, the easier to not miss anything related to Diablo III. One can never be too prepared, eh?
linkpost comment

C/C++ Preprocessor Macro Idioms [Feb. 5th, 2009|01:14 am]
I've been tormented the past few days about an article I read several years ago. I barely even remember what it was about, only that the author played very clever tricks with the preprocessor's token pasting functionality to push the limits of what the preprocessor can do. Maybe it was a parameterized repetition macro? Something like that. I do remember it took advantage of the asymmetry in expansion between function-like and object-like macros.

Anyways, I've been poking around the (very nice!) Chromium source code. One thing I took a close look at is how Google organized their logging macros. In the process, I noticed two tricks they used that I think are effectively preprocessor idioms, and since Google doesn't turn up much for that search phrase, I figured I'd document what I saw.

The external interface of the Google logging system is pretty simple: the LOG macro takes a severity parameter, such as INFO or ERROR and expands into a temporary object that derives from std::ostream. The tricks have to do with the severities: the tokens used as severities aren't #defined to the preprocessor! Instead, whatever is passed in is pasted onto a longer token that expands to an object declaration with the desired parameters. This lack of actual #define statements effectively gives the severity tokens a specific local scope, even though the preprocessor deals only with explicitly global scope. And the one-to-many trick effectively forms a preprocessor-style switch() construct, where the macro parameter can affect the expansion of the rest of the macro.

Incidentally, now that I look, the Boost preprocessor library uses the same switch() trick to implement BOOST_PP_IF. Nifty.
linkpost comment

Patrick Awuah: Educating a new generation of African leaders [May. 25th, 2008|10:20 pm]
Why We Program:
The ability to create is the most empowering thing that can happen to an individual.
-- Patrick Awuah
link1 comment|post comment

The J Programming Language, also known as Oh God My Eyes Are Bleeding [May. 5th, 2008|02:18 am]
[music |Nickel Creek - House Of Tom Bombadil]

So, back towards the beginning of the semester, we had a simple assignment in Data Compression: compute pixel differences for an image file. The differences will be defined "going backwards" -- the i'th difference will be the i'th value, minus the (i-1)th value (as opposed to the (i+1)th). So, including a "virtual" pixel value of 128 at the beginning, a list of numbers such as

0 1 4 4 3 10

would be transformed into the list

-128 1 3 0 -1 7. Because these are really 8-bit unsigned char values, this is equivalent to 128 1 3 0 255 7.

In a mainstream, Algol-derived language like C or Python, this problem has a simple, obvious solution that is intelligible even to programmers who don't know the language. Something like this:


import sys

if len(sys.argv) != 2:
    print "Error! Must give an input filename."
    print "\tUsage: python imgdiff.py in_file"
    sys.exit(1)

infile_name = sys.argv[1]
outfile_name = infile_name + ".diff"


infile = open(infile_name, "rb")
data = infile.read() # Read all bytes from file
infile.close()

data = chr(128) + data # Prepend pseudo-pixel for diff. calculation

# Calculate the differences between each pixel in the data
diff_at = lambda i: ord(data[i]) - ord(data[i-1])
diff = [chr( diff_at(i) % 256 ) for i in range(1, len(data))]

outfile = open(outfile_name, "wb")
outfile.writelines(diff) # Write the difference values to the output file
outfile.close()



This Python program is short and simple. It has six blocks, half of which are just one line. For the non-programmers who might be interested in commentary (hi Joe!): The first "big" block ensures that we have an input file to read values from and a file to write to; the second block just reads the value in. The line data = chr(128) + data just adds the value 128, our virtual pixel value, to the beginning of the list. The next line is a direct translation of our definition of a difference pixel, given above, into Python syntax. The trickiest line is the one defining diff, because it makes use of a list comprehension. From the inside out, chr( diff_at(i) % 256 ) computes the i'th pixel difference, modulo 256 (which is the maximum value a character (chr) can be. That computation is repeated for every pixel in the file, and the resulting list is assigned to diff. Finally, the last line prints out the differences to the output file.

So that's nice: simple, clean, easy, boring. I wanted something more... esoteric.

Usually, when a computer science class gets an assignment, the instructor picks the language the students use, but for this assignment, we were given free reign to pick any language we wanted to get the job done. Since this particular professor puts a lot of emphasis on program readability, I thought it would be amusing to see the expression on his face when presented with a valid solution in a completely unintelligible language. But which language to use?

Perhaps the most amusing might have been a language called Whitespace (in which a valid solution would be a blank sheet of paper... or maybe two blank sheets of paper), but that would have been pushing it, even for me. But then I remembered reading something about J, somewhere. I thought it had been cited in an exhortation from Steve Yegge for programmers to learn a wider variety of languages, but I can't seem to find it now.

So, J.

Here's the equivalent program in J:

((256|(2-~/\])128,a.i.1!:(2}ARGV)){a.)1!:3(3}ARGV)

Fun, eh?

Here's an easier-to-understand version:

infile  =: 2}ARGV            NB. grab command line args
outfile =: 3}ARGV
chars   =: 1!:1 infile       NB. 1!:1 means read contents of file
nums    =: a. i. chars       NB. convert chars to ascii indices
diff    =: 2 -~/\ ]          NB. uhh...
diffs   =: 256|diff 128,nums
(diffs {a.) 1!:3 outfile


Okay, so that's not THAT bad for the first few lines. Sure, 1!:1 is a pretty atrocious syntax for reading in file contents. but we can look beyond that for now. a. i. chars is actually sort of cool. a. is a table of the ASCII characters, chars is a list of the characters from the file, and i. is (in this context) the index-of operator. It works like this:


'abcde' i. 'bed'
1 4 3


In essence, a. i. is the J equivalent of Python's ord function, but built out of a more fundamental operator. Cool enough to forgive the, er, terse syntax. But what the heck is the next line?!? diff =: 2 -~/\ ] -- are you kidding me?

Actually, it's straight out of the J vocabulary reference. Sentences in J read right-to-left. The ] is an identity operator; it simply selects whatever comes to its right. In essence, here it stands in for the thing being diffed, much like the word "it" itself. The \ character means infix, where x is 2, u is -~/, and y is ]. I'll cover u in a second, but first, a quick illustration of \. Suppose you wanted to select every successive pair of elements from the list 1 2 3 4. Then u would simply be the identity function ], like this:

2 ] \ 1 2 3 4
1 2
2 3
3 4


So what does -~/ mean? - is the subtraction operator, simple enough. / is the insert operator, so +/ 1 2 3 is 1+2+3, and -/ 1 2 is 1 - 2. But note that we don't want 1 - 2, we want 2 - 1. That's what ~ does -- it swaps arguments, so 1 -~ 2 is the same as 2 - 1. Phew!

256| means compute values mod 256. { a. is the equivalent of the chr function in Python, converting integers back to characters. And, finally, 1!:3 prints.

Simple and intuitive, eh?

Here's another example. The task is to take a binary file and figure out the Huffman codewords encoded therein. The file format is 256 words of 4 bytes, followed by 256 sets of 1-byte lengths. Each length gives how many of the low-order bits from the corresponding word are part of the Huffman codeword.

Python, I was pleased to see, has a module called struct that is built for doing exactly this kind of bit-level interpretation. Given a string of 4 chars, the value of those chars as an integer can be had with struct.unpack("l", chars). Cool! Unfortunately, Python doesn't have built-in libraries for converting integers to binary strings. The end program ended up being about 25 lines, not including trivial things like file input.

J fares rather better. Negative numbers passed to the infix operator gives non-overlapping infixes, perfect for splitting our list of bytes into chunks of 4.
_3 ]\ 'abcdefghi'
abc
def
ghi


And J has built-in operators for converting to and from binary representation of integers. So, given a list of four integers, we can select the low, oh, four bits like this:
(-4) {. , #: 0 0 0 9
1 0 0 1


That really speaks to the conciseness of J, I think. From 25 lines of Python to one line of J.

J is like concentrated Perl, with all the sugar evaporated out. Honestly, it makes my brain hurt to look at J for too long.
linkpost comment

Summer of Code 2008 Projects [Apr. 24th, 2008|05:55 pm]
So, the Summer of Code 2008 projects were posted a few days ago.

A few of the ones I think are particularly interesting:

PHP: Zend LLVM Extension
Vim: On-the-fly Code Checker, Visual Studio 2005/2008 Plugin
Python: Django on the JVM
Mercurial: Rebasing, svn tools, partial cloning, TortoiseHG for Linux
LLVM: C++ classes support, llvm/clang distcc, STM support
GCC: Windows Improvements, C++0x lambda functions
Boost: Multi-core/SIMD optimizations for Boost::Math and Boost::Graph
Cairo: HDR image surface type
Portland State University: A System for Patent Categorization and Analysis,
X.Org: GPU-Accelerated Video Decoding

Other thoughts: Gosh, Python and KDE have a lot of projects! Looks like Cython and NumPy are popular Python subprojects. Mercurial is four for four in terms of interesting-to-me projects. Five of eight Linux projects are printing-related.

Good luck to everyone working on it this year!
linkpost comment

Going offline [Mar. 28th, 2008|03:30 am]
In 10 hours, I'll be going across the country for a few days to see my brother and take advantage of the ridiculous amount of snow Oregon has been getting. After that, I'll be home for a few days, and then off to Canada for the ACM Programming Contest. I'll be back April 10.
linkpost comment

Debbie Downer Injects Reality [Mar. 1st, 2008|09:48 pm]
[music |Show Me - John Legend]

According to Wikipedia, in the last ten years, approximately 96 people have been killed and 94 wounded by school shootings.

The census in 2000 showed a total of 33.86 million students at the high school level or above. That gives (190/10)/(33.86e6/1e5), which works out to a rate of approximately .056 per 100000 students being killed or injured per year.

Incidentally, that's a little less than the rate given by the NIH for accidental alcohol poisoning deaths in 1996-1998 in the 15-24 age group. Extrapolating from NIH data, this means that, on average, 20.22 students die directly from alcohol poisoning, and 64 die directly or indirectly of alcohol poisoning. In addition, according to researchers at Boston University, among college students in 2001, there were 1349 alcohol-related motor vehicle crash deaths, plus an additional 368 nontraffic alcohol-related injury deaths.

So, as a student, I am statistically more likely to drink myself to death than to be shot by another student. And I'm 140 times more likely to be killed by drunk driving than a school shooting.

But still. University police officers will be walking around with guns. To make us safer, I'm told. If only they could save us from ourselves.
link1 comment|post comment

Funky Math [Feb. 14th, 2008|08:45 pm]
My sister and I have been independently growing more cognizant of money. One interesting finance site is Mint. It looks very convenient, but I feel more than a little squeamish about putting my financial information in a third party website.

Mint also has a blog that's chock-full of useful tips. In one post, they give a quick overview of the practical financial aspects of car-buying. One thing that caused me to raise an eyebrow, however, was their stance on down payments. In essence, they noted that paying a higher monthly interest rate and while keeping a "down payment" in a high-yield savings account can be better than using the down payment to decrease the size of your car loan. That's pretty unintuitive! They're saying that taking an extra $4,000 worth of car-payment-loan at 7.5% and putting $4000 in a savings account at 4.75% is BETTER the alternative, which is using that four thousand in the lower-rate account to "pay off" the higher-interest-rate loan. That doesn't make sense at first glance; usually, you're best off when you pay off loans in decreasing order of interest rates.

I checked their math; sure enough, you're better off taking the larger loan. Strange! I realized a few days later, while their comparison is useful and reflects what goes on in "the real world," it's not entirely apples to apples. The reason lies not only in how much money is loaned, paid, and invested, but also when. Also, as an aside: In calculating interest on money you invest, Mint assumed no compounding. I redid the numbers to be a little more realistic, compounding once a month.

As a quick overview: In both cases the car is $20,000, and you have a 48-month repayment plan at 7.5%.

In case 1, you pay $4,000 down and pay $387 a month. $4,000 + 48 * $387 = $22,576.

In case 2, you pay nothing down, and your monthly payment is $484, and the total amount you pay at the end is $23,232, $656 more than case 1. At this point, it looks like the down payment is a good idea: you pay less money! But if you take the down payment and put it in a CD at 4.75% APY, you'll earn $835, and will come out $179 richer than case 1.

So what gives?

Well, the thing is that case 2 assumes you have more money to start with. If you simply took a down payment, the most money you've ever invested in the car at any given point in time is $22,576, on the day you send in your last car payment. In contrast, look what happens in case 2. Your car payments total $23,232, PLUS you have to pay an ADDITIONAL $4,000 to earn interest in the CD. That means that right after you send in the last car payment, and before you close out the CD, you are $27,232 poorer than the day you signed the loan. After you close the CD, of course, your total cost comes out to a final value of $22,472.

So, then, what's the apples-to-apples comparison? You need to give yourself 27,232 - 22,576 = $4,656 more to invest over the course of the 48 months. That way, when you're done with the payments in case 1, you'll have set aside exactly as much money in total as you did in case 2. Coincidentally enough (just kidding!), $4,656/48 comes out to $97 per month, which is the exact difference between the monthly payments in the two scenarios!

This makes sense: now, in both cases, we have an initial outlay of $4,000 that we use for something-or-other, and monthly payments of $484. For our "new" case 1, we use $4,000 for a down payment, and invest the extra $97 per month this frees up. Assuming we our only use of the extra money is to make deposits into our CD or savings account, we will earn $598 interest on our principal of $4,656. This is excellent: we earn less interest, but we also pay much less interest, and our total cost is a mere $21,978. Instead of being $179 richer than we were originally, we're now $598 richer. Yay! For completeness, I'd also note that this is essentially the same ($8 less) than if we could avoid paying for the higher-interest-rate car loan by both making a down payment AND giving the higher monthly payment over a smaller number of months.

What can we conclude? The best strategy is to make as large a down payment as possible at the beginning, and then, each month, put as much leftover money as we can into a high-yield savings account. Shocking, isn't it?

As an aside, the economic cost of the car itself (ignoring practical things like title fees and insurance, as well as the monkey wrench known as inflation) is not $20,000, nor is it $21,978. The future value of the money used to pay for the car is $30,466. That's how much you could have had if you didn't buy the car in the first place. This means that the opportunity cost of the car is more than 150% of the sticker price!
linkpost comment

Review: Founders at Work [Feb. 5th, 2008|10:47 pm]
[Tags|]

Founders at Work: Stories of Startups' Early Days


Oops, I got sidetracked. I'd been talking about startups, and then I got distracted by the "ooh shiny" of speech recognition and ACM problems. I'm not actually done with speech recognition quite yet, but this book was in my queue first. Yeah, no more procrastinating for me! It doesn't have anything to do with the fact that the book is due back at the library in a few hours! No sirree Bob!

Founders at Work is a collection of thirty-two interviews with the founders of successful technology companies. For anyone interested in tech, tech history, or behind-the-scenes glimpses of startup life, the book offers hundreds of pages of insight, trivia, lessons learned, and advice.

A number of things stand out about the stories, common threads shared between stories. One is that the majority of the companies took a very roundabout path towards success. Many companies started with a business plan that got thrown out at the first available opportunity. Many others started with no real business plan at all. The photo-sharing site Flickr was originally created by Ludicorp to be part of an online game, and didn't appear until two years after Ludicorp itself was started. PayPal started doing cryptography software for Palm Pilots. They did a website for one of their products as an afterthought; that website quickly defined the company. Pyra Labs, creators of Blogger, was started by friends before they had any product ideas. They eventually focused on building an ambitious collaboration tool called Pyra, and kept each other updated with a dirt-simple blogging system. In retrospect, of course, it's obvious: it was that blogging system, not Pyra, that solved a real problem in a useful way. Quite a few other products started off with one or two people scratching a personal itch. Bloglines, del.icio.us, Firefox, and Yahoo all started out as humble tools meant for nobody but their creators.

More... )
linkpost comment

ACM Programming Contest solution walkthrough [Jan. 28th, 2008|03:35 pm]
[Tags|]

In preparation for the ACM World Finals coming up in April, we've been running through previous years' problem sets to get a feel for what we'll be up against. It's been interesting, for the most part. The problems definitely require more up-front thought before a solution presents itself, compared to the "regular" contest problems. Since a cursory Google search didn't turn up anything useful for "ACM contest walkthrough," I thought I'd write up the solution to one of the more interesting problems we've covered.

Crowd around, young ones... )
linkpost comment

Speech Recognition Poetry [Jan. 25th, 2008|02:47 pm]
[Tags|]

One more bit of speech recognition poetry:
At the one who in the heart and make it home
that in his first alerted eight when he didn't have a duty to
pretend that have lent an eight-DOS attack
it has our neighbors down here were using her get her a hug her head to head the hello you don't have clearly were
the error rate has
had his day that the
Iran arms dealer who died in one of the times
the Ingraham her while using her head and all who really girl who lived in her

That's from Vista trying to transcribe a recording of my roommate's finance class lecture. The more you try to comprehend it, the less sense it will make.

But I do love that turn of phrase: "an eight-DOS attack." Heh.
linkpost comment

cat | say | transcribe [Jan. 23rd, 2008|02:37 pm]
[Tags|]

So, apparently I was quite remiss with my last post. I was, apparently, supposed to take my last exploration with speech recognition to its logical conclusion, and see what happens when the computer tries transcribing its own "voice."

So, we'll take a text file containing a snippet of text, run that through Say.exe to produce a WAV file, run the WAV file through Transcribe.exe to produce a text file, and compare the two text files.

Here's the introduction to Pride and Prejudice:
It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.

However little known the feelings or views of such a man may be on his first entering a neighbourhood, this truth is so well fixed in the minds of the surrounding families, that he is considered the rightful property of some one or other of their daughters.


Here's what the older Windows XP recognition engine produces. As before, the lack of punctuation is expected.
A's the truth universally acknowledged that a single man in possession of the good fortune that the unwanted the wind out average little known the feelings organisms that chain and maybe I've his first entered a bridge read this to the cell wealth extend the minds of the surrounding families that he is considered the rightful property on someone or other of their daughters


And Vista's newer engine:
He is a truth universally acknowledged that a single man in possession of a good fortune
part B1 of the white pal ever little down the feeling for the use of such a manner beyond year's first entry neighborhood this truth in cell lab fixed in the mind of the surrounding family's that he is considered a rightful property at someone or other of the daughters


Interestingly, Vista's output isn't significantly better than XP's. Past experience showed that Vista transcribes more accurately given the same input. Thus, we are led to an intriguing hypothesis: in an effort to create a more natural-sounding computer voice for Vista, Microsoft also created a voice that was more difficult to automatically transcribe!

Another example, this time using Shakespeare's Sonnet 68:
Thus is his cheek the map of days outworn,
When beauty lived and died as flowers do now,
Before these bastard signs of fair were born,
Or durst inhabit on a living brow:
Before the golden tresses of the dead,
The right of sepulchres, were shorn away,
To live a second life on second head,
Ere beauty's dead fleece made another gay:
In him those holy antique hours are seen,
Without all ornament, it self and true,
Making no summer of another's green,
Robbing no old to dress his beauty new,
And him as for a map doth Nature store,
To show false Art what beauty was of yore.


I added linebreaks to Vista's output, to better enable comparisons between the two versions:
Not that his cheek and at a denny's out
120 minutes and I has flowered now
the forties bastard signs of tired were born
all orders to inhabit, leading brown
deflected golden tresses added that
the right of sepulcher sit where slightly delayed
a second light on secondhand
terribly stabbed Fleetwood another game
became those calling NT Cal sky scene
but not all or none at itself a true
B. Demille summer of another screen
prodding know all too dressy is getting you
at ms foreign at Tiffany to restore
to show false are like the meet with your


I didn't bother with XP.
link1 comment|post comment

Transcribe-n-Say [Jan. 18th, 2008|12:50 am]
[Tags|]

Doing the transcriptions for Glenn Kelman's presentation earlier this week ended up taking more time than I expected it to. I can type reasonably quickly, 50 words per minute, perhaps 90 if I concentrate hard and stay away from the Backspace key. But a normal rate of speech is about 130 words per minute, so I can only transcribe about a sentence at a time. So the transcription process goes like this: Listen to audio, transcibe a sentence, seek the audio back a few seconds, repeat. Not the most exciting work around.

So I figured, hey, what about having the computer transcribe stuff for me? Windows XP itself doesn't come with speech recognition built-in, but Microsoft includes a speech recognition engine with Office XP and 2003, and also has one included as part of the free Microsoft Speech API (SAPI 5.1) SDK. And all five versions of Vista come with an updated, more-accurate engine. (For those of you keeping score at home, that would be Home Basic, Home Premium, Business, Enterprise, and Ultimate. Phew!)

Anyways, SAPI itself has a few simple functions that do most of the work for us in doing a transcription. Basically, you take a stream, bind it to a (WAV) file, pass that stream to the speech engine, enable "dicatation mode," and print out what the the engine thinks it hears. File to stream to engine to recognized text.

That didn't seem too complex, so I set out to write a quick C# command-line app to do that. For the most part, even though it wasn't designed with C# in mind, the COM interop story for SAPI 5.1 is pretty good. Unfortunately, the C#-language binding for the SpStream.BindToFile function is a little iffy. The C++ type signature for the file name is const WCHAR *. Somehow, that got translated into ref ushort. Not so good. I found a post from one Microsoft employee, Dave Wood, which acknowledged this problem with SAPI, and also gave suggested workarounds. However, I figured that, considering how simple the app would be, I might as well just write it in C++.

Well, I did. The hardest part turned out to be learning about the multitudes of string representations in Win32 COM programming, and how to translate between them. A little reading of documentation, sample code, and other examples online, and I had something that worked. No error handling logic in place, but it worked.

For certain values of worked, anyways. By "worked," I mean that it fed the WAV data to the speech recognition engine, and got results back. The usefulness of those results is another issue entirely.

First, the source of the WAV makes a pretty big difference in recognition accuracy. This makes a bigger difference when using the older version 5.1 recognizer on XP than Vista's newer version 8.0, but it's noticeable on both. A WAV file I created using Audacity and a speech-recognition-tuned microphone yielded decent results on XP and great results on Vista. A snippet of the audio from Kelman's presentation, converted to WAV format, was... spottier. I manually ran Audacity's "Normalize" command to make the waveform graph more similar to the mic's recording, and that improved recognition accuracy somewhat. Unfortunately, the results on XP are comical and, at times, almost poetic.

I recorded this sample WAV file, which is just me saying "Computing research has made remarkable advances, but there's much more to be accomplished. The next ten years of advances should be even more significant, and even more interesting, than the past ten," which is simply the first two sentences from the abstract of Ed Lazowska's UWash talk.

Here's what the version 5.1 recognizer thought it heard: "Computing research has made remarkable advances but there's much more of a published the next ten years of events this should be even more significant and even more interesting from the past ten." Close, but not quite. The lack of periods and puctuation is expected, but turning "to be accomplished" into "of a published," well, not so much.

Vista's version 8.0 recognizer does much better: "Computing research has made remarkable advances but there's much more to be accomplished the next 10 years of advances should be even more significant and even more interesting than the past 10." Flawless! Note that Vista is smart enough to turn "ten" into "10." Not such a big deal here, but it's much nicer to read "5,313,852" than "five million three hundred thirteen thousand eight hundred two." Anyways, I think that's pretty darn impressive for a completely untrained speech-recognition system.

Unfortunately, the results on Kelman's presentation weren't nearly as good. I took a short clip of his presentation, which I transcribe as "... man. I learned how to do everything. And, uh, a couple of years later I started Plumtree Software with a few of my friends. And if you can start a company, um, everyone will tell you it's too soon, um. And I'm sure they'll be right, there were so many things that I didn't know, ah, when I started Plumtree, and I erred egregiously, uh, and really suffered for it. But, ah, you'll never know everything..."

First, Vista's valiant attempt. It gets points, I suppose, for including recognizable phrases from the original audio. Knowing what the real transcription is, we can see a mutilated version of it here, and I think you can get a general sense of what the original audio was saying. Not all the fine points, but the general sense, sure.
A man I learned how to do everything
and got a couple years later I started onto software will hit my friends
and if you can set a company of everyone will tell you that it's too soon
hung at sure it'll be right there are some things that I didn't know how I started entree am a grievously on
and really suffered for it but the deal ever now everything


Meanwhile, here are a few of XP's attempts. These are all with the exact same input file, mind you. Same input, very different outputs.
Air guard had every hour at our doorstep start of his software NFS at its start out
that are out how data it's too soon
that ensured underwriter scientists I didn't know that I saw it and trade vendor who displayed (suffered a heart
out ever known everything


Event that my head.)
and not to use their instead of his software Jennifer and 75ยข of the
outdoor elements out in a season
that ensure the main) so it's I didn't know that aside and trade at graciously I suffered during
the demo never know everything


That men have everything at Gottschalks understand outside of his software to the outset if you can start company
founder of how the united states suing
that ensure the main) scientist I didn't know


It's really sort of poetic, in its own twisted way.




So, anyways, if you have XP or Vista, and don't mind tinkering with basically-untested software, you can download Transcribe.exe and test it yourself. XP users will probably need to download and install the SAPI SDK 5.1 in order to get a speech recognition engine installed. Users of both XP and Vista may (or may not) need to install the Visual Studio 2005 C Runtime components, depending on what other software you've already got installed.

Usage is simple: pass in the name of a WAV file to read from, and the name of a file (probably .txt) to output the results to. On my computer, a 2.0 GHz Core 2 Duo, it transcribes at roughly 3x realtime.

Having the ability to transcribe without easy speech synthesis felt like having yin without yang. So, enter Say.exe. It's even simpler than Transcribe. It, too, is a command-line application.
say.exe usage
If the first argument is a file, the other arguments are ignored and whatever the file contains is passed to the default TTS voice to speak. If the first argument is not a file, whatever arguments are provided are passed to the default TTS voice to speak.

Say.exe is written in C#, and targets the .NET Framework version 3. It should run on Vista without any other downloads required, but XP computers that don't have it yet will need to download .NET Framework 3.0 first.

Please note that both of these apps were written with no real error handling to speak of. I don't think anything disastrous should go wrong with them, but for all I know they could turn mutant and eat your files, your leftovers, and your houseplants. Then again, I doubt it.
link3 comments|post comment

Glenn Kelman: How to Get the Most Out of a Startup [Jan. 13th, 2008|01:17 am]
[Tags|]

I've been reading more about two things lately: startups and investing. Well, that's not entirely accurate. I've been coming to the slow realization that programming, while fun, makes up less than half of what it takes to be a successful software engineer. The thing is that professional "programmers," from what I've read, end up spending less than half their time programming. The rest of it is spent with wetware ops, interacting with (gasp) actual people, directly or indirectly. So writing, interpersonal skills, luck, and business acumen each contribute at least as much as raw programming skills do in the creation of a "good" coder. Well, I can't do much about luck, and between business and human relations, I find the business side of things more palatable, so I've been looking at how things work from a business perspective.

I've also realized that I don't particularly want to work for a super-large company, nor am I interested (at the moment) in doing fundamental research or teaching. That leaves startups, small businesses, and contracting work as the most-plausible means of gainful employment. And all three of those paths benefit greatly from knowing a bit about the business side of things. So, I've been independently investigating the subset of the business world that seems most directly applicable to my immediate future. That means reading about finance, investing, MBAs, and tech startups, as well as taking Macroeconomics over the winter session. In the last week, I've come across quite a few bits about startups, both purposefully and incidentally. Here's one of the incidental ones: a talk given at the University of Washington by Glenn Kelman. I stumbled across it because UWash's RSS feed got horked and displayed 87 older talks; Glenn's was one of the ones in the list that caught my eye.

As a CS student, the whole general dichotomy of "stable, big, boring" established companies versus "small, dynamic, risky, exciting" startups is ever present. But there's a difference between knowing that something is different, and knowing how to take advantage of that difference. Luckily, Glenn Kelman, the guy who started is now CEO of Redfin, and co-founded Plumtree Software, knows a thing or two about startups. Glenn is a relaxed and funny presenter, a pleasure to listen to and learn from.

I highly recommend checking out UWashington's archive of Glenn's talk. Here's a selection of quotes from Kelman that I found interesting:
Pretty Good Quotes )
link2 comments|post comment

Vista [Dec. 23rd, 2007|05:44 pm]
So, I've had Vista (64 bit) installed for about a week now. I actually like it, from a user interface perspective, more than XP. It has a lot of nice touches:

  • When you go to restart your computer, and there are programs still running, it gives you a nice list of the programs preventing a restart.

  • Searchable start menu

  • Windows Explorer Breadcrumb Bar

  • Explorer at-a-glance drive usage bars

  • More attractive icons

  • Revamped, easier-to-use sound control panel

  • Much saner (fewer spaces!) filesystem layout for user home folders.

  • Related: no more "My" prefix!

  • Related: Separate Desktop and Downloads folders.

  • Task Manager -> Resource Monitor is nice

  • Explorer's taskbar: Organize -> New Folder



Features I was looking forward to using, but didn't get a chance to play with:

  • Speech recognition

  • Per-app sound controls

  • Ink



Not-so-great things:

  • When bringing up a context menu, you have to click the actual name of a file in Details view, otherwise you get the parent folder. This bit me again and again.

  • Useless additions: Sidebar, rearranged control panels,

  • User Account Controls

  • Un-tweakable Explorer taskbar buttons, especially Burn

  • App compatibility: iTunes doesn't burn CDs out of the box, and bootable-USB-flash-drive utilities (based on old 16-bit code) don't work period.

  • Explorer.exe stability issues. In one week, it unexpectedly quit five times.



Overall, I was happy with Vista for the price I paid (that is, nothing, thanks to MSDNAA). I didn't find Vista sluggish, thanks to four gigs of RAM. When I had only 1 GB installed, it was much more painful. I wouldn't recommend using it with anything less than two GB.

The not-so-great things about it were mostly ignorable or disable-able, so I would have stuck with it. Unfortunately, the other thing that went wrong was that I'd get complete system freezes, usually after a toaster box popped up saying "The graphics driver stopped responding and recovered successfully." This would happen every two or three days. Not often enough to debug, but regularly enough that, after the third time, I could see it wasn't going to stop. It happens at stock speed, and even more frequently when overclocked. Other than that, the system is orthos stable and memtest clean.

I suspect the culprit here is Vista's new WDDM driver model. The new driver model may lead to fewer stability issues in the future, it's possible. Unfortunately, at the moment, the graphics car makers have dozens of years of experience writing drivers under the old model, and roughly one year of experience of having actual people run their Vista drivers.

So, it's back to 32-bit Windows XP for me. It was a fun experiment while it lasted, and I think it's good to form one's own opinions. Maybe in another year or three, whenever I next upgrade my processor, I'll re-evaluate Vista. More likely, though, I'll move to Ubuntu by that time. Oh well.
link1 comment|post comment

New Computer! [Dec. 19th, 2007|03:42 am]
Pixies )

Still working on getting my software environment back up to speed. Installed Vista Business x64 (thanks, MSDNAA!), and it worked fine for a day or two until it got into a mysterious infinite-reboot loop on bootup. Then Ubuntu x64 install froze, as did XP x64 AND both 32- and 64-bit editions of Vista. After two further days of banging my head against the wall, Vista's install disc magically started working again, and a quick System Restore put things back in good working order. I'm still not sure what caused the CD issues in the first place. I suspect the install got corrupted from either Daemon Tools or, most likely, AnyDVD. Will very very cautiously approach those apps in the future.

Sadly, iTunes (specifically, CD writing) doesn't work in 64-bit Windows yet. iTunes will run in a virtual machine, but sound is staticky. Better than it was on the old machine, but still not good enough. That means my options are, at the moment, 32-bit XP or Vista, or Ubuntu with Amarok or Banshee. OK, it looks like manually installing a 64-bit CD driver for iTunes makes everything except iPhone syncing work. Good enough for me, I neither have nor want an iPhone.

Anyways, Ubuntu is a possibility, but I don't (yet?) have a strong ideological preference for it, and it has a number of things going against it:

  • Migrating a 65 GB iTunes library would be a pain

  • From what I've seen, Linux doesn't have much going on in speech recognition yet. Maybe in 5 or 10 years, Google will share the 411 love...

  • In a related note, hardware support isn't quite up to par for things like the MX Revolution mouse. There are hacks that will let you change the mouse wheel mode from the CLI, but nothing like SetPoint's context-sensitive behavior switching.

  • I do like playing the occasional game, which was after all the entire point of getting a nice graphics card in the first place. Thus, I could either run Linux and dual-boot, or just run Windows.


In the future, when I have more money and less time, the solution to all of these problems will probably be to buy a Mac. For now: I actually like Vista so far. The plan is to craigslist 2 of the 6 GB RAM in the pic now, before memory prices drop even further, and make do with ~3.5 GB of RAM for the next few years.
linkpost comment

Wordpress Reinstalled [Oct. 31st, 2007|09:43 am]
[Tags|]

WordPress got horked and wasn’t displaying the posts page properly, so I reinstalled it, which actually turned out to be much less painful than trying to track down what was causing the issue in the first place.

Oh well.

linkpost comment

I could have sworn I already posted this… [Oct. 22nd, 2007|12:13 am]
[Tags|, ]

Apologies to everyone who’s been eager to see the results of my Summer of Code project. School has left me busier than I expected to be.

That said, here are instructions for building render-jp2. Replace $OBJDIR and $SRCDIR with the appropriate values, depending on your .mozconfig.

  1. Download and build Mozilla’s source through CVS.
  2. (from the root mozilla dir:) svn co http://svn.eschew.org/projects/mozilla/jp2/libjasper/mozilla/extensions/render-jp2/ extensions/render-jp2/
  3. cd $OBJDIR
  4. $SRCDIR/build/autoconf/make-makefile extensions/render-jp2/
  5. make -C extensions/render-jp2/

This results in an XPI in $OBJDIR/dist/xpi-stage/ named something like render-jp2.r294.PLATFORM_ARCH-GECKOVERSION.xpi.

There are also updated Firefox 3 render-jp2 builds for Mac OS X and Windows. Builds for Linux and Firefox 2 should be forthcoming.

linkpost comment

navigation
[ viewing | most recent entries ]
[ go | earlier ]

Advertisement