The Lutheran Geek

The life and times of a WoW-playing, Java-programming dude in Chicago

Wednesday, February 25, 2015

Formal Languages, part 2: Regular Expressions and Regular Languages

Before I start in on this entry, I need to give credit where credit is due and note that my main reference source for these blog posts on formal languages is my grad school textbook, An Introduction to Formal Languages and Automata, Third Edition by Peter Linz (the 5th edition is linked). If you’re out there, Peter, thanks for writing an awesome intro to this nerdy topic, and congratulations on having a great first name. 🙂

So let’s talk about regular expressions. Again, we’re not talking about the Perl version with lazy matching and look-ahead and look-behind and all that magic; we’re discussing austere, formalized regular expressions. The idea is that a regular expression matches a set of strings, and those strings make up a regular language. (This is jumping the gun a bit because we actually define regular languages using finite automata, but we’ll come back to that later.) For formality’s sake, we first note that the empty set ∅ and the empty string λ themselves are regular expressions; the former is, of course, a set of zero sentences, while the latter is a set of one sentence, namely the empty string. In addition, any member a in a given alphabet Σ is also a regular expression of a single sentence one symbol long. We then add on concatenation (r1·r2, or more simply, r1r2), alternation (r1 + r2, which is expressed in Unix and Perl REs as r1 | r2, and means “r1 or r2“) and star-closure (r*, i.e. zero or more rs in a row), where the rs are composed of the primitive regular expressions ∅, λ, and all a ∈ Σ. (We almost always ignore the ∅ and λ, however, and just talk about combinations of symbols in the alphabet Σ.) We can also group sub-expressions with parentheses. So this way, we can say things like “a language with an even number of as followed by an odd number of bs”, which would be expressed as:

r = (aa)*(bb)*b

To break that down, we first see that (aa)* represents any number of aas: 0, 1, 2, etc., and so gives us an even number of as. The second part, (bb)*b, indicates that we again have an even number of bs, but then there is one extra b at the end, giving us an odd number. We can give many examples of regular expressions: “all bit strings (i.e. strings of 0s and 1s) which has at least one pair of consecutive 0s”, “all strings on an alphabet such that the length is divisible by 3”, “all bit strings that, when interpreted as a binary number, is greater than or equal to 39”, and so on. So it seems like we can define quite a few complicated languages using just this simple way of expressing patterns! However, remember the example from part 1:

/^1?$|^(11+?)\1+$/

This isn’t a true regular expression, as we noted – you can’t do things in regular expressions like “lazy matching” or “back-referencing”. All we can say is “this followed by that”, “this or that” and “zero or more thises”. But maybe it’s possible that there is a regular expression that does what the above does, namely match all composite numbers expressed in unary notation. Try if you’d like, but you’ll soon see it’s futile. In another post, I’ll explain the goofily-named pumping lemma which is used to prove that a particular grammar does not represent a regular language, and thus cannot be represented by a regular expression. We’ll then use the pumping lemma to prove that we can’t construct a regular expression for this language.

I think that’s enough for now. Next time, we’ll discuss finite automata and how they’re used to truly define regular languages. As a preview, we’ll see that there are two types: deterministic automata which require that you always know what to do next, and non-deterministic automata which allows “guesses” as to how to proceed. The magic part, as we’ll see, is that they are equivalent! But you’ll have to come back later to see what I’m talking about. 🙂

posted by Peter at 9:34 pm  

Sunday, February 15, 2015

Formal Languages, Part 1: Basic Concepts

Last week, there was a link posted on Hacker News that described a regular expression that only matches non-prime numbers (expressed as a string of “1”s, i.e. in so-called “unary” notation):

/^1?$|^(11+?)\1+$/

Commenters noted that this is not a real regular expression. But what does that mean? Anyone with experience in Perl regular expressions can see that this clearly will work just fine; the linked article does a good job of breaking down how this goes about matching (or not matching) a given string of 1s. You see, the history of Unix has sort of muddied the term “regular expression” to the point that it means something in practice that is broader than the theory. Originally, the old Unix command line tool grep was designed with classical “regular expressions” in mind. However, as is well-known, classical regular expressions are very limited, and so a separate tool called egrep was written to accept so-called “extended regular expressions”, and eventually egrep‘s notion of an extended RE is what we now call a “regular expression”.

So what is a “classical” regular expression? To explore that question, we need to look into the field called “formal languages”. Despite the name, the field has little to do with programming languages or spoken languages, although it is used in theoretical studies of these kinds of language. Formal language study is the study of patterns, in a way. It asks: what is the simplest pattern which can describe this (usually infinite) set of strings? So to understand what a regular expression is, we need to set down some definitions first that will allow us to talk about formal languages in general. Once we do that, we can talk about the family of so-called “regular languages” that regular expressions encompass, and go beyond that to define more powerful languages.

Basic concepts of formal languages

language is just a set of strings that are constructed from an alphabet. An alphabet is simply a finite set of symbols such as ‘a’, ‘b’, ‘c’, etc. Traditionally, an alphabet is denoted with the symbol Σ. Strings are composed of zero or more symbols from the alphabet – the so-called empty string is traditionally denoted by λ.

Since strings can contain many consecutive copies of a particular symbol, we can use a shorthand to represent repetition:

an = aaaaaa… (n times)

For n = 0, we define

a0 = λ

When starting with an alphabet Σ, it’s useful to talk about the possible strings it can make. Σ* is the set of all strings consisting of zero or more symbols from Σ. So say that our alphabet was very simple: it just consisted of {a}. In this case,

Σ* = { λ, a, aa, aaa, aaaa, … }

or, more succintly,

Σ* = { an : n ≥ 0 }

Σ* always contains λ; if we want to exclude it, we define

Σ+ = Σ* – { λ }

A language, then, is just a subset of Σ* – perhaps in a way we can define succinctly, perhaps not. A string that is a member of a language is called a sentence. Remember, these are formal, abstract terms, so don’t get too hung-up on the real meaning of “language” or “sentence” here.

Finally, we have the concept of “grammars”. Again, this isn’t like a grammar you’re used to, although it’s quite close! Sentences in various languages can only be constructed in certain ways; some (like English) are subject-verb-object, some (like Japanese) are subject-object-verb; even others use different combinations. So a basic grammar rule in English is expressed as:

[sentence] ⇒ [subject] [verb] [object]

Subjects and objects can be further defined as agglomerations of nouns and articles and such. In the same way, we define grammars with our languages and alphabets:

G = (V, T, S, P)

G is a grammar described as a combination of three sets and one element:

  • V is defined as a set of variables; as few as one. These are usually denoted with capital letters such as S, A, B, etc.
  • T is defined as a set of terminal symbols. This is almost always the alphabet Σ that we are already familiar with
  • S is a single member of V, referred to as the start variable; by convention, it’s usually just the letter S. This variable is assumed to be, naturally, the start of any derivation using our grammar.
  • P is a set of productions similar to our grammar rule above. Productions can contain any number of variables and/or terminal symbols on either side of the arrow, though we specify that the left side contains at least one symbol, while the right side can be empty (i.e. λ).

A language, thus, can be defined as all the sentences such that there is some chain of productions in P called a derivation that produces the sentences – we denote this language as L(G), or the language derived from the grammar G.

To wrap up, here’s a simple grammar:

G = ({S}, {a, b}, S, P)

where P is given by the two productions:

S ⇒ aSb
S ⇒ λ

We can derive the sentence “aabb” as so:

S ⇒ aSb ⇒ aaSbb ⇒ aabb

By applying the first rule 0 or more times, followed by the second rule, it’s pretty easy to see that this grammar defines a fairly simple language:

L(G) = { anbn : n ≥ 0 }

So now we can talk about formal languages. Next time, we’ll actually get into regular languages, although along the way we will have to detour and talk about “automata”. Robots? Well, not really. More like a robot that defines a language. 🙂

posted by Peter at 8:22 pm  

Sunday, February 8, 2015

An engineer’s devotional

Back in 1998, I was a volunteer at Christ Lutheran Church in Albany Park. I worked there as a youth worker for a year through the Lutheran Volunteer Corps. Being right out of college, it was a great way to spend a year to start finding my way in the post-adult world, as well as to give back a little before I got consumed in the world of work. This past week, the pastor, Tom Terrell, contacted me and others asking us if we were willing to contribute entries for his congregation’s Lenten devotional. It was really great to hear from him, and considering what a crazy week this has been personally (more about that some other time), I accepted the opportunity. Here is my humble contribution.


O LORD, my heart is not lifted up,
my eyes are not raised too high;
I do not occupy myself with things
too great and too marvelous for me.
But I have calmed and quieted my soul,
like a weaned child with its mother;
my soul is like the weaned child that is with me.
O Israel, hope in the LORD
from this time and forevermore.

— Psalm 131 (NRSV)

These days, I work as a software engineer. I’m fortunate enough to work with a group of people who are very down to earth and wonderful people, but I’ve seen some of the darker sides of what people can become in this line of work: arrogant, dismissive, and forgetful of the fact that working with the people around them matters more than the programs they design. It can be a struggle to be a more gentle soul in this line of work when an unstated expectation of your job is that you should always assert yourself to the front of every discussion.

God wants us to be the best “us” we can be. God does not expect us to fit into one particular mold, no matter what our particular gifts are. The psalm I quoted above encourages me that God especially loves those who take the time to be quiet and try to understand: understand life, understand our role in the world, understand love, understand ourselves. God also does not leave us alone in this struggle, but is a partner with us, hand quietly on our shoulder as we go through life.

posted by Peter at 5:32 pm  

Monday, July 28, 2008

Life on hold

I found out today it will be “three to four weeks” until I am able to close on my condo. This comes one week after the nominal close date specified in the contract I signed back in June, a date the selling agent had agreed to in advance. This whole situation is frustrating beyond belief. I hiave never felt so powerless over a situation. Everyone on “my side” is assuring me that they are working their hardest to resolve this situation. But there’s nothing I can do. The ironic part of all this is that my preparedness to get everything lined up for the new condo is actually working against me. I’ve had to call ATT, DirecTV, Comcast, etc. to let them know that, no, I am not actually moving when I said I was.

So now I go into a holding pattern. Thankfully, I was able to extend my lease of my apartment in Evanston for a month while this nonsense gets sorted out. There’s really just nothing to say. I was hoping to make a post here explain the process and what’s holding me up, but frankly, I just don’t have the energy right now. I’ve been spending too much time today just feeling defeated. It’s really all I can muster right now; just that feeling of “well crap, there’s nothing I can do”. Everything’s on hold: my new place to live, the settling in, my parents coming to visit my new place, getting a dog. All of it has to wait. It’s all incredibly unfair; I just want to scream and cry, but what good can that do?

Ugh. That’s about all I can muster now. Just, ugh. 🙁

posted by Peter at 2:26 pm  

Friday, July 25, 2008

Musical art

So I’ve been listening to Pandora Radio a bunch lately. It is an amazing site, especially to test-drive new artists. In my case, I’ve been listening to stations based on Imogen Heap and Regina Spektor. They’re both solo female artist singer-songwriters, but their styles differ quite a bit. Imogen tends to do a lot of heavily-overdubbed songs with layers on layers of vocals and various instruments, with the vocals occasionally transformed via a vocoder, notably in her famous and gorgeous song “Hide and Seek“. Regina Spektor comes from the anti-folk movement in NYC, a stupidly-named scene that takes a lot of the musical traditions of Dylan and such and carries them forward to the present day with various lyrical themes. Her music is spare; about half of her songs are just her and a piano. “Samson” is a song from her latest album that came out a couple years ago, a gorgeous little ditty about first love viewed in retrospect.

Inspiring music. Gorgeous, amazing songs that make me wish I could write music like this. It’s so sucky. I played violin and piano for years, but as a kid, I just never got into it. I was much more into playing games on my Nintendo than practicing those instruments. Perhaps it was my way of passive-aggresively rebelling against authority and all the duties I felt were pressing in on me. I blew it, in any case. I could go on psychoanalzying myself, but the point of all this was to say, just, wow, I really wish I could get these kind of soundscapes in my mind that these people seem to be able to do. One thing I want to do with my new condo (when I actually can move into it! arrrrgh) is pick up a keyboard and start trying to relearn some piano chops and try my had at a little composition with Garage Band. I have no illusions of ever becoming anything like Imogen or Regina (and what is it with me and female artists?…), but I’d just love to be creative again somehow. WoW is fun and all, but it is ultimately not at all a creative endeavor, in the literal sense that nothing is created through your time playing the game. It’s the social aspect that makes it interesting, and why I’m still with the group of friends I’ve been with for almost 2 years now, save a couple months early last year.

Meh, as usual, I’m rambling more than saying anything interesting. But yeah. Friday afternoons are a great time to think about life, since you don’t have much else to do. 😛

Peace.

posted by Peter at 4:41 pm  

Sunday, June 24, 2007

Going on vacation

So I’m leaving for a while for my parents’ place starting Friday 6/29 – I’ll be back in Chicago around on July 8. One thing I am going to try to do while gone is get my mind back in order. I have been spiraling pretty quickly down into a pseudo-depression yet again, and I’ve been lashing out at friends along the way, people who have gone out of their way to help me and listen to me. At least a few friends this past week have told me they’re fed up with it. The more tightly I cling, the more desperate I become, and it’s just getting worse. My greatest fear is losing friends – I just haven’t had much luck in keeping them because of my extreme self-consciousness. Every little thing – every well-meaning ribbing, misspoken word, everything negative becomes amplified. Everything positive becomes diminished. It’s worse than seeing the world in purely black and white with no shades of grey – eventually you only see black.

It’s even gotten to the point where it’s affecting my work – nothing I do there seems to matter, and these roller-coaster rides of emotion I’ve been on are getting worse and worse. What’s even more terrible is comparing my own situation with those around me – what the hell do I even have to complain about? I have a great job, loving parents, awesome friends. How dare I complain?

And yet, I do. It’s a tough spot to be in. So anyway, to those I’ve hurt (I’m talking especially to you, Muzz/Dave), I apologize. You guys, my friends, have given a lot of yourself that I cannot repay. Maybe I’ll find my way back to sanity, maybe I won’t. Please be assured that I’m trying, though. Also know that I’m not blaming anyone for the state I’m in – not even myself. It is what it is. I don’t expect pity or condolences or anything – I just wanted you guys to know where I am right now.

posted by Peter at 7:35 pm  

Powered by WordPress