To all recruiters, welcome.

I'm available to hire immediately in the greater Seattle area. Eastside is preferred since I live in downtown Bellevue, but I'll commute if I must.

No telecommuting, please. I want to work in an office. I learn more from my team that way.

**Environment:** Looking

To all recruiters, welcome.

I'm available to hire immediately in the greater Seattle area. Eastside is preferred since I live in downtown Bellevue, but I'll commute if I must.

No telecommuting, please. I want to work in an office. I learn more from my team that way.

**Environment:** Looking for an environment where a smart, hard-working, and knowledgeable software craftsman (in training) can thrive, grow, and learn. I will make mistakes, but will always work hard to keep from being a burden on my team. I won't bring my ego to work. I'll accept when I'm wrong and take the blame when I goof up.

**Team:** I want to work with a professional team and write great software, including testing, debugging and profiling. I'm looking to construct quality software.

**Work/Life Balance:**
I've lived the startup life for the last 7 years, and I'm tired. I can't work 84 hours a week anymore. Don't get me wrong, I'll put in time after work hours to improve my skills and get up to speed with the team, and work extra for the occasions when there are time-sensitive project deadlines. But it's not sustainable all the time. No more startups, either. If I join, or start, another startup, my wife will cry.

**Technical:** Bring me hard problems. I can learn anything. If you need me to become an expert on OS internals, geometry algorithms, compiler construction, neural networks, or even assembly language, then I will.

**Language Preferences:**

- Compiled languages, preferably C or C++. The trickier the better. Java and Go are great, too. I want to work on high-performance code.
- For interpreted languages, Python please. I've done many years of Javascript and PHP for the web, and just want some change in my life.

**Extras**:

- machine learning
- low-level engineering
- distributed systems

- I've studied full-time for the last 8 months to level up to become a software engineer.
- I'm still studying and doing coding problems on the whiteboard and coding challenge websites every day.
- I started 3 tech startups in the last 15 years.
- I can read, write, and speak Korean.
- I currently run 2 tech startups that take very little of my time (yay automation).
- I've worked as a web developer since 2001.
- One of my startups competed with Google, and we won in the end.
- My computer science study plan is the #27 most-starred project on Github.
- I want to become a great engineer.
- I drive a Jeep Wrangler Rubicon but I do only light off-roading because I don't want to damage it.
- I like fountain pens.
- I'm really nice.
- I dress pretty well.
- I can only drink one beer or cocktail - any more than that and I get sleepy.

More on Linkedin - let's connect: https://www.linkedin.com/in/johnawasham

]]>Common uses:

- substring search
- spell check
- autocomplete

This is an implementation I just put together. Enjoy!

**Trie API:**

- add(word) - Add the given word to the trie.
- contains(

I'm a fan of tries. I can't claim that I've used them in practice, but I wish I had. They're super!

Common uses:

- substring search
- spell check
- autocomplete

This is an implementation I just put together. Enjoy!

**Trie API:**

- add(word) - Add the given word to the trie.
- contains(word) - Return True if given word exists in the trie.
- suggest(prefix) - Return word suggestions for the given prefix, including prefix if it is a word.
- suffix_count(prefix) - Return the number of words available for the given prefix.

```
class TrieNode:
"""
A TrieNode represents a letter and its continuing suffixes along
a word path. This implementation keeps track of counts of suffixes.
"""
__slots__ = ['letter', 'ends_word', 'suffixes', 'counts']
def __init__(self, letter):
self.letter = letter
self.ends_word = False
self.suffixes = {}
self.counts = [0] * 26
def get_letter(self):
return self.letter
def suffix_exists(self, letter):
return letter in self.suffixes
def get_suffix(self, letter):
return self.suffixes[letter]
def add_suffix(self, letter):
self.suffixes[letter] = TrieNode(letter)
def get_suffixes(self):
for key in self.suffixes:
yield key
def increment(self, letter):
index = self.letter_to_index(letter)
self.counts[index] += 1
def get_count(self, letter):
index = self.letter_to_index(letter)
return self.counts[index]
def end_word(self):
self.ends_word = True
def ends(self):
return self.ends_word
def letter_to_index(self, letter):
return ord(letter) - ord('a')
class Trie:
"""
A Trie stores letters/symbols for a set of words for fast
word, prefix, and suffix search.
"""
def __init__(self):
self.root = TrieNode('')
def add(self, word):
"""
Add the given word to the trie.
:param word:
:return: None
"""
current = self.root
for c in word.lower():
if not current.suffix_exists(c):
current.add_suffix(c)
current.increment(c)
current = current.get_suffix(c)
current.end_word()
def contains(self, word):
"""
Return True if given word exists in the trie.
:param word:
:return: bool
"""
end_node = self._traverse(word.lower(), True)
return end_node is not None
def suggest(self, prefix):
"""
Return word suggestions for the given prefix, including prefix
if it is a word.
:param prefix:
:return: list
"""
prefix = prefix.lower()
prefix_node = self._traverse(prefix, False)
suggestions = []
word = list(prefix[:-1])
# now dfs (iterated), finding all ending words
stack = []
stack.append(prefix_node)
while stack:
node = stack.pop()
if node is None:
word.pop()
continue
word.append(node.get_letter())
if node.ends():
suggestions.append(''.join(word))
for c in node.get_suffixes():
stack.append(None)
stack.append(node.get_suffix(c))
return suggestions
def word_count(self, prefix):
"""
Return the number of words available for the given prefix.
:param prefix:
:return: int
"""
prefix = prefix.lower()
prefix_node = self._traverse(prefix[:-1], False)
return prefix_node.get_count(prefix[-1])
def _traverse(self, prefix, ends):
current = self.root
for c in prefix:
if current.suffix_exists(c):
current = current.get_suffix(c)
else:
return None
if ends and not current.ends():
return None
return current
```

Is that code too long?

Here's a much shorter version, optimized for only adding words (in all lowercase) and telling you how many words it contains with a given prefix. This is much faster.

```
class TrieNode:
__slots__ = ['letters', 'word_count', 'ends_word']
def __init__(self):
self.letters = {}
self.word_count = 0
self.ends_word = False
class Contacts:
def __init__(self):
self.root = TrieNode()
def add(self, word):
current = self.root
for ch in word:
current.word_count += 1
if ch not in current.letters:
current.letters[ch] = TrieNode()
current = current.letters[ch]
current.word_count += 1
current.ends_word = True
def partial(self, prefix):
"""
Returns the number of words containing this prefix.
:param prefix:
:return:
"""
current = self.root
for ch in prefix:
if ch in current.letters:
current = current.letters[ch]
else:
# prefix was not found
return 0
return current.word_count
```

]]>But something was missing.

I had read several books and watched months worth of videos, and the code I had seen along the way

]]>I thought I was done reading books for the learning phase of my study process, and was itching to get back to the coding problems phase.

But something was missing.

I had read several books and watched months worth of videos, and the code I had seen along the way was C, Java, C++, but not too much in Python. Because Python is my language of choice for the interview, I felt I was lacking in seeing code that represented the data structures and algorithms I was learning.

I bought Grokking Algorithms, for review purposes and for some key chapters. The code is in Python, but some early chapters don't show very much code at all. It's a great book, but wasn't scratching my itch.

The dream book I was looking for was: Introduction to Algorithms (CLRS), **but** instead of pseudocode, I wanted all the code in Python. In addition, I can do without the countless pages of mathematical proofs.

The perfect find: Data Structures and Algorithms in Python by Goodrich, Tamassia, and Goldwasser.

The same text, in Java, is used as an optional text for the introductory algorithms course at UC Berkeley (CS 61B).

If you're interviewing with Java or C++ as your chosen language, you can use those versions of the book:

- Java
- C++ - the 1st edition has some low reviews, but 2nd edition reviews look great. See below about the free 7-day evaluation.

The textbook is large, at over 700 pages, and it was pretty rough reading it for hours every day until it was done. **But it was worth it.**

It was not only an excellent review of the material, but also a terrific resource on coding in Python 3. It uses Pythonic code all the way through, including wide use of the Python data model (for example, using/implementing dunder methods in classes).

It's a textbook on par in its breadth with Introduction to Algorithms, but without the extreme mathematical rigor/proofs.

Here's what it covers (abbreviated to save my fingers):

- Python Primer
- Object-Oriented Programming
- Algorithm Analysis
- Recursion
- Array-Based Sequences
- Stacks, Queues, and Deques
- Priority Queues
- Maps, Hash Tables, and Skip Lists
- Search Trees
- Binary Search Trees
- Balanced Search Trees
- AVL Trees
- Splay Trees
- (2,4) Trees
- Red-Black Trees

- Sorting and Selection
- Merge-Sort
- Quick-Sort
- Selection

- Selection
- Pattern-Matching Algorithms
- Dynamic Programming
- Text Compression and the Greedy Method
- Tries

- Graph Algorithms
- Data Structures for Graphs
- Graph Traversals
- Transitive Closure
- Directed Acyclic Graphs
- Shortest Paths
- Minimum Spanning Trees

- Memory Management and B-Trees
- Memory Management
- Memory Hierarchies and Caching
- External Searching and B-Trees
- External-Memory Sorting

Do you need to know all of this? **No.** But I want to be super-awesome not only when I interview, but also on day one.

Amazon has it, but it's expensive in hardcover. I purchased the paperback version from an Amazon vendor for $15. I think it's a no-no that they are selling the paperback (international version) in the US, but I really don't want to shell out $100 to $150. I would've paid $60 to own.

While you're waiting for the paper edition, **you can read a free version on Kindle** or Kindle app on iPad, Android, or desktop. That's what I did. I'm still waiting for my copy of the book to arrive.

Click the **eTextbook** tab and click "Try the eTextbook for free".

**P.S.** I found an easter egg on Amazon. When you load the page on Amazon for Introduction to Algorithms, after the page loads, Clifford Stein's name fades in at the end of the author list. I think this is intentional, because for years Stein wasn't an author. He came along later.

It's the future, right now.

It's the equivalent of finding a O(log α(n)) solution for a

]]>I wanted to spend a good bit of time gaining deeper knowledge and more experience with machine learning and data science, but just ran out of time this week.

It's the future, right now.

It's the equivalent of finding a O(log α(n)) solution for a O(n!) problem. *(That α is the inverse Ackermann function, a ridiculously slow-growing function.)*

Instead of spending 6 months writing an algorithm and thousands of rules to define how to solve a very hard problem, you let machine learning learn the rules from examples. It is, in a way, writing its own algorithm from the data you give it.

Most of the time spent doing data science is actually spent gathering and cleaning data (made easy with a little programming).

Google is using machine learning right now, and they see the promise and results already, but only a small percentage of Google engineers have experience with it.

Applications of machine learning at Google:

- How Google Is Remaking Itself As A Machine Learning First Company
- Large-Scale Deep Learning for Intelligent Computer Systems (video)
- Deep Learning and Understandability versus Software Engineering and Verification by Peter Norvig

There are many online courses on the subject, including a Self-Driving Car Engineer Nanodegree (I want that!), but the best intro to understand the math, statistics, and process behind machine learning is Stanford's Machine Learning Course:

Concretely, Andrew Ng rocks.

I took this course many months ago, and I found it fascinating. It has some very serious math:

Sometimes I ask myself what I’ve gotten myself into. #MachineLearning #neuralNetworks pic.twitter.com/EvJ8mMSNSK

— John Washam (@StartupNextDoor) February 26, 2016

Don't let it scare you.

The math builds up slowly so you can follow along, and the first week includes a review of linear algebra, which I hadn't seen since high school.

The course uses Matlab and Octave. You get a free license of Matlab to use during the course but I used Octave, which is an open-source alternative with similar syntax. It can also read and write Matlab files.

Matlab and Octave are great, and Matlab is widely used, very expensive, and has a language of its own.

The two main languages (other than Matlab-compatible) used in data science and machine learning are Python and R. Python has packages like scikit-learn and numpy that you can use and avoid implementing your own regressors and classifiers. In just a few lines of code, you can implement some very cool technology. In addition, Tensorflow, an open-source package for building neural networks, gives you a neural network in just a few lines.

**You can get started with machine learning today**, without any knowledge of it. Here is a short playlist of tutorials by Josh Gordon to get you started. You'll see how easy it can be:

Books! How I love them.

This book is a best-seller, and very well-reviewed. It will be the first book I tackle when I have the time.

Python Machine Learning by Sebastian Raschka

Another best seller, by an ex-Googler, no less.

Data Science from Scratch by Joel Grus

This is a preorder, but looking at the table of contents and skimming some of the content, it looks quite promising.

Introduction to Machine Learning with Python by Andreas C. Müller and Sarah Guido

- Google's Cloud Machine learning tools (video)
- Tensorflow (video)
- Tensorflow Tutorials
- Courses:
- Stanford: Machine Learning
- videos only
- see videos 12-18 for a review of linear algebra (14 and 15 are duplicates)

- Neural Networks for Machine Learning
- Google's Deep Learning Nanodegree
- Google/Kaggle Machine Learning Engineer Nanodegree
- Self-Driving Car Engineer Nanodegree
- Metis Online Course ($99 for 2 months)

- Stanford: Machine Learning

One example is: given four points on a 2-dimensional plane, and the first three of the points create a triangle, determine if

]]>One example is: given four points on a 2-dimensional plane, and the first three of the points create a triangle, determine if the fourth point lies inside or outside the triangle.

Another geometric problem is: given a number of points on a 2-dimensional plane, compute the minimum number of boundary points, that if connected, would contain all the points without creating a concave angle.

Here is one of the solutions I generated in Python:

So how do you do it?

I got a clue from a lecture. It involves using a point as a pivot and determining which of two other points are the most clockwise from each other.

Before I watched more of the lecture, I was determined to figure out an algorithm that would solve it in a reasonable amount of time.

My idea was:

- sort the points from left to right (least value of x to largest) - O(n log n) where n is the number of (x, y) points
- starting with the leftmost point p:

- go through each point to the right of that point, and using p as a pivot, find which point is the most clockwise. O(n)
- set the most clockwise point as the new p - O(1)
- loop again with new p

- this continues until the starting point is reached O(h) - where h is the number of hull points

In order to "prematurely optimize" (I know it's bad) I was trying to make the all the comparisons only on points to the right of p, but then I would need to flip and go the other way once the max x value was reached.

It was turning out to be way more complicated than it should be. I was trying to get it from O(n^{2}) down to O(n log n) but really all my optimizations were just making it O((n log n) + (n * h)).

I ended up cleaning it up and just **getting the algorithm where it was correct, not fast**. Then once it was correct, I would make it faster.

So I tore out a bunch of code and just got it working. And it worked beautifully.

I got rid of all the code that figured out if comparison points were to the right of the pivot point. They didn't help improve the complexity.

I was able to remove the sort, also. It wasn't needed. I could find my start point, the minimum x-value point, in linear time. It didn't matter what order the comparison points were in, since I was keeping track of the maximum clockwise-dness as I went along, the same as a linear search for the maximum value in an unsorted array.

This was the new way:

- Find the minimum x-value point, the initial point p - O(n)
- Using p as a pivot:

- find which other point is the most clockwise - O(n)
- set it as new p - O(1)

I ended up with h pivot points, each comparing its n neighbors to the one with the maximum clockwise angle.

This is O(n * h).

So I watched the rest of the lecture and it turns out my algorithm was one of the 2 solutions. It's called the Jarvis march, aka "the gift-wrapping algorithm", published in 1973.

```
from collections import namedtuple
import matplotlib.pyplot as plt
import random
Point = namedtuple('Point', 'x y')
class ConvexHull(object):
_points = []
_hull_points = []
def __init__(self):
pass
def add(self, point):
self._points.append(point)
def _get_orientation(self, origin, p1, p2):
'''
Returns the orientation of the Point p1 with regards to Point p2 using origin.
Negative if p1 is clockwise of p2.
:param p1:
:param p2:
:return: integer
'''
difference = (
((p2.x - origin.x) * (p1.y - origin.y))
- ((p1.x - origin.x) * (p2.y - origin.y))
)
return difference
def compute_hull(self):
'''
Computes the points that make up the convex hull.
:return:
'''
points = self._points
# get leftmost point
start = points[0]
min_x = start.x
for p in points[1:]:
if p.x < min_x:
min_x = p.x
start = p
point = start
self._hull_points.append(start)
far_point = None
while far_point is not start:
# get the first point (initial max) to use to compare with others
p1 = None
for p in points:
if p is point:
continue
else:
p1 = p
break
far_point = p1
for p2 in points:
# ensure we aren't comparing to self or pivot point
if p2 is point or p2 is p1:
continue
else:
direction = self._get_orientation(point, far_point, p2)
if direction > 0:
far_point = p2
self._hull_points.append(far_point)
point = far_point
def get_hull_points(self):
if self._points and not self._hull_points:
self.compute_hull()
return self._hull_points
def display(self):
# all points
x = [p.x for p in self._points]
y = [p.y for p in self._points]
plt.plot(x, y, marker='D', linestyle='None')
# hull points
hx = [p.x for p in self._hull_points]
hy = [p.y for p in self._hull_points]
plt.plot(hx, hy)
plt.title('Convex Hull')
plt.show()
def main():
ch = ConvexHull()
for _ in range(50):
ch.add(Point(random.randint(-100, 100), random.randint(-100, 100)))
print("Points on hull:", ch.get_hull_points())
ch.display()
if __name__ == '__main__':
main()
```

The other algorithm, at O(n log n), uses a sort and then a simple single pass of all the points, and making only left turns as it goes around the perimeter counter-clockwise. When the next point is a right turn, it backtracks past all points (using a stack and popping points off) until that turn turns into a left turn. This algorithm is called the Graham scan.

Which algorithm is better? It depends on your points. If most of the points will lie on the hull, the n log n algorithm will be better. If you have relatively few hull points bounding most of the points, the n*h will be better. You could always plot a random sample of the points on a graph and then choose your algorithm from there. I think most points that resemble randomness will benefit from the Jarvis march.

]]>Maximum flow falls into the category of combinatoric optimization problems. The Ford-Fulkerson method, published in 1956, is a way of solving the computation of maximum flow through a flow graph, which could represent a network or a set of fluid

]]>Today I'm studying flow graphs and disjoint sets data structure.

Maximum flow falls into the category of combinatoric optimization problems. The Ford-Fulkerson method, published in 1956, is a way of solving the computation of maximum flow through a flow graph, which could represent a network or a set of fluid pipes. It is considered a method, not an algorithm, since there are pathological cases that will never terminate, and further, have no guarantee of convergence toward optimality.

The method follows all paths from a source node to a sink node, and augments the available capacity, reducing the capacity by the minimum available capacity in the path. The reduction in capacity is accumulated, and once there are no more paths available (meaning capacity is at maximum at some edge on each path), the accumulated value is the maximum value, and is also the minimum cut.

Here is Python code (from Wikipedia) that calculates maximum flow in a flow graph, slightly changed for safety when comparing a value to None:

```
class Edge(object):
def __init__(self, u, v, w):
self.source = u
self.sink = v
self.capacity = w
def __repr__(self):
return "%s->%s:%s" % (self.source, self.sink, self.capacity)
class FlowNetwork(object):
def __init__(self):
self.adj = {}
self.flow = {}
def add_vertex(self, vertex):
self.adj[vertex] = []
def get_edges(self, v):
return self.adj[v]
def add_edge(self, u, v, w=0):
if u == v:
raise ValueError("u == v")
edge = Edge(u, v, w)
redge = Edge(v, u, 0)
edge.redge = redge
redge.redge = edge
self.adj[u].append(edge)
self.adj[v].append(redge)
self.flow[edge] = 0
self.flow[redge] = 0
def find_path(self, source, sink, path):
if source == sink:
return path
for edge in self.get_edges(source):
residual = edge.capacity - self.flow[edge]
if residual > 0 and edge not in path:
result = self.find_path(edge.sink, sink, path + [edge])
if result is not None:
return result
def max_flow(self, source, sink):
path = self.find_path(source, sink, [])
while path is not None:
residuals = [edge.capacity - self.flow[edge] for edge in path]
flow = min(residuals)
for edge in path:
self.flow[edge] += flow
self.flow[edge.redge] -= flow
path = self.find_path(source, sink, [])
return sum(self.flow[edge] for edge in self.get_edges(source))
def main():
g = FlowNetwork()
[g.add_vertex(v) for v in "sopqrt"]
g.add_edge('s', 'o', 3)
g.add_edge('s', 'p', 3)
g.add_edge('o', 'p', 2)
g.add_edge('o', 'q', 3)
g.add_edge('p', 'r', 2)
g.add_edge('r', 't', 3)
g.add_edge('q', 'r', 4)
g.add_edge('q', 't', 2)
print(g.max_flow('s', 't'))
if __name__ == '__main__':
main()
```

]]>Here are a few:

So I converted some of them into round trips, since data accesses are round trips in themselves. I started with an L1

]]>This well-known article, Numbers Everyone Should Know, was intriguing, but it was hard to mentally get a handle on the scale of the numbers.

Here are a few:

So I converted some of them into round trips, since data accesses are round trips in themselves. I started with an L1 cache access, at 0.5ns, or a 2 ft round trip. So 1ns = 4 feet (1.219 meters). Read on for the rationale.

Imagine you're sitting at your desk, and need some information in a book or on a sheet of paper. The most convenient access of that data would be to have it handy right there on your desk, about a foot away. You reach over (1 foot or 0.3 meters) and grab it, and pull your arm back to where you can read, another 1 foot. That's a 2 ft (0.61 m) round trip. Thats equivalent to an L1 cache access. Was in L1, eh? Lucky you! You didn't even have to get out of your chair.

So you have to get out of your chair and walk across the room (large room) to a bookshelf, 14 feet away, and return to your chair, so you'll have it handy next time.

You may have to be in a library for this one, or perhaps this book is upstairs in a room at the other end of the house. Walk 50 feet, get the book, and walk it all the way back to down to your desk. No peeking at it on your way back.

You're going to have to go outside this time. At 100ns, you're going to have to walk down the block to pick up the book at your friend's house. 200 feet there and back.

This number was pulled from a talk by Chandler Carruth (Googler) at CppCon 2014: "Efficiency with Algorithms, Performance with Data Structures". I wanted a number that would demonstrate the difference between SSD and spinning disk.

Keep in mind you can't drive. That would be cheating. Put on your good walking shoes and grab a water bottle and some protein bars. You're walking 57 miles to get that book, and 57 miles back home. You must really need that book.

That's the distance from the New York Stock Exchange (lower Manhattan) all the way to Danbury, CT. You may get there in time to watch the colors change.

Zoomed out for scale:

Oh boy. For this journey, you're gonna wish the folks in the datacenter had upgraded to SSD. You thought the upgrade from 5400 RPM to 15000 RPM was going to really help. Not this time. You have to walk 3,788 miles, then walk 3,788 miles all the way back to your desk.

That's the distance from Anchorage, Alaska to Orlando, Florida.

This was just an extra one for fun, to highlight the difference between disk and a long-distance network access. A packet exchange like this would only be 2/3 of the SYN/ACK handshake for TCP, or an entire round trip for UDP.

Put on your moon boots. You're walking (theoretically) almost halfway to the moon. And then walking back.

It's in Google Sheets, naturally.

]]>It's lossy compression.

Khan Academy

]]>Today I studied cryptography and compression, so I'm trucking along. These topics are not on the coaching docs I've seen, but I added them to round out my studies. I'm basically trying to compress a 4 year CS degree into less than 1 year.

It's lossy compression.

Khan Academy has an excellent series on cryptography, and its prerequisite topic, information theory. Links are below.

Some of my notes for your enjoyment:

I found these great videos by Developer Advocate Colt McAnlis on the Google Developer Youtube channel.

I've embedded the playlist below.

For links to everything I'm studying, check out my project on github:

]]>Entropy can be defined in a few ways, some more complicated than others.

Entropy is a measure of how much information is gained (measured

]]>I've heard the term "entropy" over the years and never understood it until now. I used to think entropy was randomness, and it's kind of related.

Entropy can be defined in a few ways, some more complicated than others.

Entropy is a measure of how much information is gained (measured in bits) when an outcome is revealed.

Let's say you have a coin flip with a fair coin. There is probability of 1/2 that it will be heads, and 1/2 that it will be tails. Since either outcome is equally likely, entropy, or the unpredictability of the outcome, is at its maximum.

If you had an unfair coin, and knew that 60% of the time it would be heads, then entropy has decreased, because there is some predictability in play.

How much has the entropy decreased?

Shannon's equation of measuring entropy, here represented as H, is the sum of the probabilities of each of the outcomes time the logarithm of the 1/probability of each outcome.

We simplify 1/probability as probability^{-1}, and moving the exponent out of the logarithm as -1, and multiplying it by the whole thing. This simplifies the log portion to log_{2}(p). I use the form in the image in the calculations below.

Fair coin: H = 2 * (.5 * log_{2}(1/(.5))) = 1

Unfair coin: H = 2 * (.6 * log_{2}(1/(.6))) = 0.736966

2-headed coin: H = 2 * (1 * log_{2}(1/1)) = 0

So the entropy decreased by 0.27 bits of information. When entropy decreases from its maximum, that means that information has leaked. This information leak can be exploited in naive versions of cryptography to break it. Modern cryptography maximizes entropy so that the encryption output is indistinguishable from randomness.

Note the 2-headed coin has 0 entropy, because the outcome is reliably predictable.

I went to the board and figured out that n equally likely outcomes have the entropy of log_{2}(n). So entropy for a coin flip or a dice roll are different. A higher n means a higher entropy because the unpredictability increases as n increases.

Coin flip (fair coin): 1

Dice roll (fair die, 6-sized): 2.58

Dice roll (fair die, 10-sized): 3.32

Dice roll (fair die, 20-sized): 4.32

For powers of 2, these are easy to calculate, because log_{2}8 = 3.

Shannon conjectured that the minimum encoding for n bits was log_{2}n. It takes 1 bit to encode 2 items of information, 2 bits to encode 4 (00, 01, 10, 11). 3 bits to encode 8, 5 bits to encode 32. In encoding integers in binary, since we add 0 into the mix for integers, it takes log_{2}n bits to encode integers from 0 to n - 1.

Entropy comes into play in:

- compression
- encryption
- data transmission
- error detection and correction

So it's worthwhile to learn!

]]>Radix sort and counting sort are two examples. They both avoid comparisons, and are possible because they sort based on grouping by a single element at a time, where that single element is a letter or digit

]]>Yes, you can sort in linear time, as long as you're avoiding comparisons.

Radix sort and counting sort are two examples. They both avoid comparisons, and are possible because they sort based on grouping by a single element at a time, where that single element is a letter or digit (in mathematical terms, elements of a limited alphabet).

Comparison based sorting will never be faster than O(n log n), because it's bounded by the concept that comparison-based sorting as an element-by-element set of comparisons can be represented as a decision tree. For n sorted elements, at each node in the tree, we choose one of the n elements to be placed either before or after the other. This ends at the leaves, where there are n! possible leaves. Each leaf represents a permutation of an ordering of the elements.

A perfectly balanced binary tree with n! leaves has the height n log n.

In my example of linear sorting, taken from Programming Pearls, by Jon Bentley, outlines a sorting problem, not limited by the runtime, but by space.

Task:

- Sort the numbers in a given file.
- Output the sorted numbers to another file.

Here are the conditions:

- There are at most 10,000,000 7-digit integers in the file, one on each line.
- There are no duplicates.
- You have only 1 MB of memory.

You can see some problems right away. If we have 10,000,000 integers, and each is a 32-bit integer, you're dealing with 40 MB of data to sort. Just 1 MB? What is this? 1985? In this case, I think it was. The system's overhead and other programs only allowed for 1 MB of space to sort the data. This was part of desktop software. Good luck.

The solution is creative, and takes advantage of the fact there are **no duplicates**.

The solution the author comes up with is: store the integers in a bit vector. For each number in the file, turn on its corresponding bit in the bitmap. So if the number is 4762, flip the 4,762 + 1st bit (reserving 0 in case it shows up). Then once you've flipped all the bits on for each number in the file, iterate through the bitmap and output the corresponding bit index to a file.

This solution allows you to store 10,000,001 unique numbers (0 to 10,000,000) in only 10,000,001 / 32 bits = **312,501 bits**.

Normally you wouldn't think of this because of data loss. You're turning a 32 bit integer into a single bit. But because integers are a built-in construct of programming languages and CPUs, we have a built-in way to reconstruct the data, and it costs no data space to do so.

*n*: the number of items to sort

- Initializing a large bit vector to all zeroes: ~1.1*n operations (1 bit for each possible integer, even if that integer is not present in the input, here using 9,999,999 possible integers/9,000,000 unique numbers = 1.1)
- Reading the file: n operations
- Setting the bits: n operations
- Reading the bits from the bitmap: ~1.1*n operations (see above)
- Writing numbers to file: n operations

This gives us 5.2 * n operations, or **O(n)**. Somewhere around 52 million operations.

Note, there are no comparisons involved in this algorithm.

If this were comparison-based, the most we'd hope for is O(n log n), where log_{2}10,000,000 = 23.25. n log n would make it approximately 232 million operations.

But operations aren't really what we're going for here. It's limited space we're worrying about.

A month after I read this solution, I went to implement it in Python. You'll see a C implementation near the end of the post.

I decided to try it with 9,000,000 unique numbers, from 0 to 9,999,999.

Here's the code to generate the numbers into a file:

```
import random
def main():
numbers = random.sample(range(9999999), 9000000)
open("numbers.txt", "w")\
.write("\n".join(str(n) for n in numbers))
if __name__ == "__main__":
main()
```

This uses random.sample, which is perfect for generating unique numbers in a range.

This created a **71 MB file**.

I created a Python class called Bitsort to implement the bitmap solution:

```
class Bitsort(object):
def __init__(self, max_number):
self._int_bits = 32
self._buckets = (max_number // self._int_bits) + 1
self._bit_array = [0] * self._buckets
def save_number(self, number):
bucket = number // self._int_bits
bit_number = number % self._int_bits
self._bit_array[bucket] |= 1 << bit_number
def get_sorted_numbers(self):
for index, bits in enumerate(self._bit_array):
base = index * self._int_bits
if not bits: # skips empty buckets
continue
for j in range(self._int_bits):
if bits & (1 << j):
yield base + j
def main():
bitsorter = Bitsort(9999999)
with open("numbers.txt", "r") as in_file:
for line in in_file:
bitsorter.save_number(int(line.rstrip()))
out_file = open("out.txt", "w", 4096)
for number in bitsorter.get_sorted_numbers():
out_file.write(str(number) + "\n")
if __name__ == "__main__":
main()
```

In Python, everything is an object, and uses memory multiples larger than the underlying primitive data types.

In order to make it easy to index into the right place to set/get bits, I used a Python list and indexed into a "bucket" where I could set the correct bit (from 0 to 31). The number of buckets: (max possible number // bits per integer) + 1.

That's 312,500 buckets. I'm using a list to hold 312,500 integers.

**Sizes:**

- Empty Python list: 64 bytes
- Python list of 1 integer: 72 bytes (64 + 8)
- Python list of 2 integers: 80 bytes (64 + 8 * 2)
- Python list of 312,500 integers: 2,500,064 bytes (64 + 8 * 312,500)

So we're already *way over* our memory allotment. There's just too much object size overhead. However, it's still *much less* than the 41 MB it would normally take if it was just a C-style memory slab. We're using only **6% of the memory** we'd normally use!

I used memory-profiler to see what kind of memory this sorting application was taking overall.

Memory usage peaks at 27 MiB, which is much higher than expected.

I then changed to use the bitarray module.

The class conversion to use it was pretty easy. Everything else stayed the same, due to good encapsulation.

```
import bitarray
class Bitsort(object):
def __init__(self, max_number):
self._int_bits = 32
self._bit_array = bitarray.bitarray([False] * (max_number + 1)) # 1 for 0
def save_number(self, number):
self._bit_array[number] = True
def get_sorted_numbers(self):
for index, bit in enumerate(self._bit_array):
if bit is True:
yield index
```

My expectation was that this would be more memory efficient. Not so much.

So you want to use C and just get a slab of memory, eh pal? You know what you're doing?

I took this code form the book, with minor modifications, running it as:

```
john$ clang -ggdb bitsort.c -std=c99 -Wall -Werror -o bitsort
john$ ./bitsort < numbers.txt > out.txt
```

```
#include <stdio.h>
#define BITSPERWORD 32
#define SHIFT 5
#define MASK 0x1F
#define N 10000000
int a[1 + N/BITSPERWORD];
void set(int i);
void clr(int i);
int test(int i);
int main(int argc, char *argv[]) {
int i;
for (i = 0; i < N; ++i) {
clr(i);
}
while (scanf("%d", &i) != EOF)
set(i);
for (i = 0; i < N; ++i) {
if (test(i)) {
printf("%d\n", i);
}
}
return 0;
}
void set(int i) {
a[i >> SHIFT] |= (1 << (i & MASK));
}
void clr(int i) {
a[i >> SHIFT] &= ~(1 << (i & MASK));
}
int test(int i) {
return a[i >> SHIFT] & (1 << (i & MASK));
}
```

I've used Valgrind for finding memory leaks for heap allocation/deallocation, but I wanted to find a tool for measuring total memory used.

I discovered Massif, and used it to output a memory graph. It's normally used to do heap profiling, but the `--stacks=yes`

flag can include stack usage.

On Ubuntu, I ran:

```
clang -ggdb bitsort.c -std=c99 -Wall -Werror -o bitsort && valgrind --tool=massif --stacks=yes ./bitsort < numbers.txt > out.txt
ms_print massif.out.8170
--------------------------------------------------------------------------------
Command: ./bitsort
Massif arguments: --stacks=yes
ms_print arguments: massif.out.8218
--------------------------------------------------------------------------------
KB
3.219^#
|#
|#
|#
|#
|#
|#
|#
|# :::::::::::::@: :: :@ ::::::@
|# :: :::::: :::@: :: :@ ::::::@
|# :: :::::: :::@: :: :@ ::::::@
|# :: :::: :::::@: : :::: ::: ::::@::: :: :::::: :::@: :: :@ ::::::@
|# :: : :: :::: @: : : :: : : :: :@::: :: :::::: :::@: :: :@ ::::::@
|# :::@: :: :::: @::: :: ::: : :@:: :@:::: :: :::::: :::@: :: :@ ::::::@
|# :::@: :: :::: @::: :: ::: : :@:: :@:::: :: :::::: :::@: :: :@ ::::::@
|# :::@: :: :::: @::: :: ::: : :@:: :@:::: :: :::::: :::@: :: :@ ::::::@
|# :::@: :: :::: @::: :: ::: : :@:: :@:::: :: :::::: :::@: :: :@ ::::::@
|# :::@: :: :::: @::: :: ::::: :@:: :@:::: :: :::::: :::@::::::@:::::::@
|# :::@: :: :::: @::: :: ::::: :@:: :@:::: :: :::::: :::@::::::@:::::::@
|#::::@: ::::::: @::::::: ::::: :@:: :@:::@@:: :::::: :::@::::::@:::::::@
0 +----------------------------------------------------------------------->Gi
0 13.37
```

It seems as though this is maxing out at 3.296 KB. I don't know why. I may not be reading this right. There is probably some compiler super-optimization going on here. Not sure. Sorry for the inconclusiveness.

Any thoughts or improvements? I like that I can wrangle an array in C with far fewer resources than in Python. I'd love to be able to have this kind of control in Python. Did I not use the bitarray module correctly?

Please comment below.

]]>Wouldn't you love this on your toolbar?

Well, you can have it, friend. Read on.

I have it in my current

]]>I needed this for myself, and Googled around but didn't find anything satisfactory, but now after a little work, I'm all set with Valgrind.

Wouldn't you love this on your toolbar?

Well, you can have it, friend. Read on.

I have it in my current project, because I've got multiple subprojects in one project, but you can put this script in your home directory or wherever.

`valgrind.sh`

```
#!/usr/bin/env bash
if [ $1 = "C" ]
then
/usr/bin/clang -ggdb main.c -std=c99 -Wall -Werror -o program && /usr/local/bin/valgrind ./program
elif [ $1 = "Cpp" ]
then
/usr/bin/clang++ -std=c++11 -stdlib=libc++ main.cc -Wall -Werror -o program && /usr/local/bin/valgrind ./program
fi
```

**What it does:** The script takes the string `C`

or `Cpp`

as an argument. It then compiles your program and runs Valgrind on the executable.

**Make it executable:** `chmod +x valgrind.sh`

Update the script as necessary. This script handles both C and C++. It uses `main.c`

for C and `main.cc`

for C++.

Go to **Preferences** > **Tools** > **External Tools**.

Click the `+`

to add an external tool.

- Fill in a name.
- Enter the path to
`valgrind.sh`

. - Add
`C`

or`Cpp`

as a parameter. This will run Valgrind the right way for your language. I made one for C, and then duplicated the external tool and changed the copy to`Cpp`

. - Set the working directory to the macro
`$FileDir$`

which will run the command in whatever directory (or file) you happen to have in focus.

By default, the tool you just made lives under **Tools** (on main menu) > **External Tools**. But navigating to that every time stinks.

Right-click your toolbar, click **Customize Menus and Toolbars**.

Click **Main Toolbar**.

Find the spot where you want to add the icon. Select an item and click **Add After...**

Now you have to find the tool to add there. Click **External Tools**.

Select it and click OK. it will complain that there's no icon. That's OK. The default is a cute green alien.

And it's been added!

Now when you click the button, it will compile in the working directory and run Valgrind on your executable.

Note the lost memory there is not part of my program's memory, as some by-product of running it on a Mac. On Linux lost memory is all 0. I keep my eye on these only on Mac:

- definitely lost
- indirectly lost

Play around with not freeing memory and you'll see what I mean.

Enjoy!

]]>Dijkstra's algorithm not only calculates the shortest (lowest weight) path on a graph from source vertex S to destination V, but also calculates the shortest path from S to every other vertex.

My implementation in Python doesn't return the shortest paths to

]]>Greed is good. And Dijkstra's algorithm is greedy.

Dijkstra's algorithm not only calculates the shortest (lowest weight) path on a graph from source vertex S to destination V, but also calculates the shortest path from S to every other vertex.

My implementation in Python doesn't return the shortest paths to all vertices, but it could. It returns the shortest path from source to destination, and the sum of the weights along that path.

It uses a thread-safe priority queue (min-heap heapq behind the scenes) from the Queue library: https://docs.python.org/3.5/library/queue.html#module-queue

```
import queue
from collections import namedtuple
Edge = namedtuple('Edge', ['vertex', 'weight'])
class GraphUndirectedWeighted(object):
def __init__(self, vertex_count):
self.vertex_count = vertex_count
self.adjacency_list = [[] for _ in range(vertex_count)]
def add_edge(self, source, dest, weight):
assert source < self.vertex_count
assert dest < self.vertex_count
self.adjacency_list[source].append(Edge(dest, weight))
self.adjacency_list[dest].append(Edge(source, weight))
def get_edge(self, vertex):
for e in self.adjacency_list[vertex]:
yield e
def get_vertex(self):
for v in range(self.vertex_count):
yield v
def dijkstra(graph, source, dest):
q = queue.PriorityQueue()
parents = []
distances = []
start_weight = float("inf")
for i in graph.get_vertex():
weight = start_weight
if source == i:
weight = 0
distances.append(weight)
parents.append(None)
q.put(([0, source]))
while not q.empty():
v_tuple = q.get()
v = v_tuple[1]
for e in graph.get_edge(v):
candidate_distance = distances[v] + e.weight
if distances[e.vertex] > candidate_distance:
distances[e.vertex] = candidate_distance
parents[e.vertex] = v
# primitive but effective negative cycle detection
if candidate_distance < -1000:
raise Exception("Negative cycle detected")
q.put(([distances[e.vertex], e.vertex]))
shortest_path = []
end = dest
while end is not None:
shortest_path.append(end)
end = parents[end]
shortest_path.reverse()
return shortest_path, distances[dest]
def main():
g = GraphUndirectedWeighted(9)
g.add_edge(0, 1, 4)
g.add_edge(1, 7, 6)
g.add_edge(1, 2, 1)
g.add_edge(2, 3, 3)
g.add_edge(3, 7, 1)
g.add_edge(3, 4, 2)
g.add_edge(3, 5, 1)
g.add_edge(4, 5, 1)
g.add_edge(5, 6, 1)
g.add_edge(6, 7, 2)
g.add_edge(6, 8, 2)
g.add_edge(7, 8, 2)
# for testing negative cycles
# g.add_edge(1, 9, -5)
# g.add_edge(9, 7, -4)
shortest_path, distance = dijkstra(g, 0, 1)
assert shortest_path == [0, 1] and distance == 4
shortest_path, distance = dijkstra(g, 0, 8)
assert shortest_path == [0, 1, 2, 3, 7, 8] and distance == 11
shortest_path, distance = dijkstra(g, 5, 0)
assert shortest_path == [5, 3, 2, 1, 0] and distance == 9
shortest_path, distance = dijkstra(g, 1, 1)
assert shortest_path == [1] and distance == 0
if __name__ == "__main__":
main()
```

]]>Every problem solvable by dynamic programming can be represented in a DAG.

A couple of things I discovered in my experiments:

- Using a
**queue**for graph traversal (cyclic or acyclic) always does

I've been learning graphs and dynamic programming somewhat interleaved. Dynamic programming tends to help solve graph problems because:

Every problem solvable by dynamic programming can be represented in a DAG.

A couple of things I discovered in my experiments:

- Using a
**queue**for graph traversal (cyclic or acyclic) always does breadth-first search. - Using a
**stack**for graph traversal (cyclic or acyclic) always does depth-first search. - DFS (traversal using stack) on a binary tree always does a
**pre-order**traversal.

Another fact (not my discovery, that's for sure), is that **breadth-first search on an unweighted graph gives the shortest unweighted path between the starting vertex and any other vertex.**

I don't have much more to add right now. I've still got a lot to study and I feel I'm falling behind. Time is short.

Right now I'm going over:

- depth-first search
- breadth-first search
- topological sort
- minimum spanning trees:
- Prim's algorithm
- Kruskal's algorithm

- single-source shortest paths: Dijkstra's algorithm
- counting connected components
- listing strongly connected components
- checking for existence of a cycle

The videos covered:

- bubble sort
- selection sort
- insertion sort
- heap sort
- merge sort
- quicksort
- counting sort
- radix sort

There are several more sorting algorithms I could have gone into, but I had to stop somewhere. I'd like

]]>I put on my sorting hat and watched many, many videos on sorting.

The videos covered:

- bubble sort
- selection sort
- insertion sort
- heap sort
- merge sort
- quicksort
- counting sort
- radix sort

There are several more sorting algorithms I could have gone into, but I had to stop somewhere. I'd like to learn about bitonic sort, signature sort, etc. So I added another few videos to the sorting section that I'll get around to.

What I really like about sorting is the terse code needed to do something very powerful. Even though comparison-based sorting theoretically can't do better that O(n log n), that's pretty darn good. Given the right constraints, such as all your keys to sort are the same length and a small k (for digits, 0..9), you can switch to radix sort and rock O(n) sorting.

Want to see some code?

Merge sort requires O(n) additional space.

```
namespace jw {
void merge(int numbers[], int low, int mid, int high) {
// temp array for holding sorted items
int b[high - low - 1];
int i = low;
int j = mid + 1;
int k = 0;
// merge items from list in order
while (i <= mid && j <= high) {
if (numbers[i] <= numbers[j]) {
b[k++] = numbers[i++];
} else {
b[k++] = numbers[j++];
}
}
// copy the remaining items to tmp array
while (i <= mid) b[k++] = numbers[i++];
while (j <= high) b[k++] = numbers[j++];
--k;
while (k >= 0) {
numbers[low + k] = b[k];
--k;
}
}
void merge_sort(int numbers[], int low, int high) {
if (low < high) {
int mid = (low + high) / 2;
merge_sort(numbers, low, mid);
merge_sort(numbers, mid + 1, high);
merge(numbers, low, mid, high);
}
}
} // namespace jw
```

Note this uses random pivot selection, and sorts in-place.

```
import random
class QuickSort(object):
def __init__(self, numbers):
self.values = numbers
self.count = len(self.values)
def sort(self):
self.quick_sort(0, self.count - 1)
return self.values
def quick_sort(self, left, right):
if left == right:
return
i = left
j = right
pivot_index = random.randint(left, right)
pivot = self.values[pivot_index]
while i <= j:
while self.values[i] < pivot:
i += 1
while self.values[j] > pivot:
j -= 1
if i <= j:
if i < j:
temp = self.values[i]
self.values[i] = self.values[j]
self.values[j] = temp
i += 1
j -= 1
if left < j:
self.quick_sort(left, j)
if right > i:
self.quick_sort(i, right)
```

And here are some notes. I take lots of notes.

]]>Splay trees are very simple as far as structure. A splay tree a binary search tree, so each node has a key, maybe a left child, and maybe a right child. There's no subtree height, ranking info, or anything else. It's pretty simple. It's the splay algorithm that gives the splay tree its magical properties.

The splay algorithm moves (splays) an accessed node (or insertion candidate's parent) to the root of the tree. So the next time it's accessed, it will be close to the root, or even the root. Values are splayed to root on insert, access, and delete (when it gets lopped off).

Splaying moves the node to the root through a sequence of subtree rotations. That's it.

The by-product of splaying is that the tree adjusts itself to common requests. However, there's no guarantee that the tree will stay balanced. It stays roughly balanced. It can get greatly out of whack, but still guarantees O(log n) amortized performance.

This snippet from an MIT lecture on splay trees ends with some surprising claims and proofs of the splay tree's capabilities.

The last ten minutes is great. I've jumped to the spot:

Here's the algorithm that moves the given value to the root of the tree:

```
spltree_node* splay(spltree_node* x, int value) {
spltree_node temp, *l, *r, *y;
temp.left = temp.right = NULL;
l = r = &temp;
for (;;) {
if (value < x->value) {
if (x->left == NULL) break;
if (value < x->left->value) { // rotate right
y = x->left;
x->left = y->right;
y->right = x;
x = y;
if (x->left == NULL) break;
}
r->left = x; // link right
r = x;
x = x->left;
} else if (value > x->value) {
if (x->right == NULL) break;
if (value > x->right->value) { // rotate left
y = x->right;
x->right = y->left;
y->left = x;
x = y;
if (x->right == NULL) break;
}
l->right = x; // link left
l = x;
x = x->right;
} else {
break;
}
}
l->right = x->left;
r->left = x->right;
x->left = temp.right;
x->right = temp.left;
return x;
}
```

]]>