Project TechM: 2013

Friday, December 27, 2013

Wavii

I touched the Github account for this project for the first time in about 5 months, just to begin the process of redoing the user interface. Somewhat simultaneously, I read this blog post from the 'front page' of Hacker News, in which the author compelled entrepreneurs with A BIG IDEA to take a step back and really invest in scoping out the competition:

It generally takes 20-30 hours to dig up all of the players in a space, which seems like a long time to spend looking. But, compared to spending 6-12 months building a product, 20-30 hours now isn't bad at all. Not putting in the hard yards looking is being blissfully ignorant... a thorough market analysis is not to be discouraging: it's a way to become an expert in a space and learn vicariously through competition.

I did not want to be 'blissfully ignorant.' I, too, wanted to 'learn vicariously.' And the more TechM incubated in my mind these last several months (as I waited until I settled into my first job, moved to a new city, bought a new laptop, etc.), the more confident I became in this idea's ability to solve a truly unsolved and important problem. I got legitimately excited, energized.

After tonight, now more than ever, it seems to me that this space is indeed an important problem.

However, it has already been solved, at least in the manner with which TechM aimed (in my wildest, most polished fantasies) to solve it.

So, enter Wavii.

As I quickly texted a friend tonight while trying to stifle my growing disappointment, Wavii is "literally everything techm could have aspired to be, and then some."

The funny thing is, I've known about Wavii for over 7 months now. I've known about Wavii since May 11, 2013, to be exact, when I began my second wave of preliminary market research. It didn't concern me too much back then though. It was certainly the most concerning existing product that I had found, but it still didn't worry me. TechM felt different enough.

But TechM has been evolving in my mind, as evidenced by its Trello board. And nowadays, what I feel would be an interesting, worthwhile problem to solve turns out to have its solution in this startup that was acquired by Google in April 2013. (So close, so far away.)

...

The bright side of all of this is that I am not defeated. I recently read a treatise by Rolf Dobelli proposing that people should "Avoid News: Towards a Healthy News Diet." While I do not agree with everything Dobelli argues for and against in his article, especially the idea that any attempt to keep up with daily news is pointless and futile, it gave me pause about my own goals for TechM.

After reading Dobelli, I tried to shift the target a bit so that TechM could hopefully avoid being irrelevant, disrupting, wasteful, limiting, 'toxic,' etc. (i.e., "TechM is going to be so smart and efficient that it will only deliver choice tidbits at a suitable pace not merely dictated by the 24-hour/daily news cycle!"). I started thinking about Longform, a curated site of 2,000-words-or-more articles that I only recently discovered, and how I could ditch puff pieces altogether in favor of recommending articles of significant length (and, it hopes to follow, significant matter). I started thinking about how I could become more prejudiced with the trending topics displayed- how I could use a topic's historical performance to better denote importance. But this vision majorly butt heads with the 'alternate interface' path I had also been exploring in my mind. So maybe I was making excuses, maybe I was desperately flailing with A BIG IDEA that was slowly sinking underneath Dobelli's newfound ideological heft. And maybe Wavii was the final weight. (Or maybe I'm giving up too soon?)

All of this is to say, if the core idea of TechM is fading as quickly as it appears to be right now, a new incarnation of something may already be on the rise. Now to harness it, define it. Move on with the lessons learned here.

How to: focus on deep instead of broad thinking, insights over factoids, quality over quantity. Continue to combat heavy and time-wasting news consumption, but by adding meaningful words to be read instead of removing meaningless words.

Saturday, July 20, 2013

Live on Github!

"The Tech Machine: a daily news landscape" https://github.com/iconix/techm

Friday, July 19, 2013

v0.1

Here is a screenshot of the first version of TechM (shown on the 'Technology' section tab). The client has been implemented in Ruby on Rails and HTML/CSS, with intentional avoidance of JS for now (ending my philosophical tech debate temporarily). A few things you can't tell from the screenshot:

An asterisk (*) next to an entity indicates that the entity represents an entire cluster of similar entities. Hovering over the entity will display the rest of the cluster in a tooltip.
Hovering over any entity will display related article titles, also in the same tooltip (this 'feature' is only for creative purposes - it allows me to visualize the importance of article titles with respect to the context of each entity and trending topic, moving forward. I have no intention of showing article titles in a tooltip in a released version of the product, however.)

The second bullet point above really hints at the major advantage of this UI: allowing me to visualize the data in a more sophisticated way than I could previously. This is a nice step up from staring at the data in JSON format.

This version is not good enough for release for several reasons:

Obviously, the color scheme is fairly horrendous. I have yet to settle on a good one.
Many of these entities aren't useful without more context. Seeing a pile of entities related to each trending topic doesn't inspire me to continue clicking around and exploring the trending topic. This is a huge problem that I clearly need to improve upon.

Here are some thoughts for improvement:

Assign weights to clusters based on # of entities in cluster & entity frequencies and show only information related to weightiest cluster, a combination of entities + related articles
Only collect the named entities that occur *directly after* the trending topic in an article title... ('meh' on this idea)
Use a POS tagger so that instead of showing entire article titles after showing important entities, just show verb phrases

I'm also going to start using git branch to explore these options for how to display the data.

Overall though, it's exciting to have a working prototype of some sort! Even though it is very rough and needs more work.

Monday, July 15, 2013

Structure of the data layer

Section, Ttopic, Cluster, Entity, Article - these terms are mirrored as classes in my Python code. The image above is a visual representation the structure of my code at this time.

As one can see, the structure is tree-like, except for two exceptions noted along the right-hand side - trending topics (ttopics) and articles should probably not be restricted to belonging to a single section and entity respectively. I am still working on implementing this behavior, and it is not represented in the image above, but it would mean that leaf nodes on the ttopic and article levels would branch upwards as well.

Below is a quick description of the utility of each class:

Section
pulled directly from Google News, these are the available news sections: Top Stories, World, U.S., Business, Technology, Entertainment, Sports, Health, Science.

Trending topic (Ttopic)
also pulled directly from Google News but dynamically changing, these are the most popular news topics for each section at any given time (similar to Twitter's trending topics which show keywords 'tweeters' are using more or growing fastest)

Article
news articles pulled directly from the Google News RSS feed for each ttopic. instead of associating articles with a ttopic directly, they are first associated with the entities (discussed below) that were extracted from them (this is what causes repetitive use of a single article between entities).

Entity
short for the concept of a 'named entity' from the information extraction world. words from article titles categorized as the names of persons, organizations, and locations with the help of Stanford NER from The Stanford NLP Group (there are other predefined categories that are not currently used in this project). once identified, entities are tallied by frequency per ttopic using PrefixSpan.

Cluster
a wrapper for entities. groups similar entities for a ttopic to mute redundancy in information displayed on future client.

Tuesday, July 2, 2013

Technology soup

From Medium:

In comes node.js, it seemed cool and would achieve what I wanted so I decided to start again...
I decided to have a client-side app and render the views in the browser as it was the “in thing” to do. I used all the goodies available to me such as Backbone, jQuery, Bootstrap and component. This tied to a node server running express.js and mongodb. To top it all off I wanted to be cool so I wrote the whole darn thing in CoffeeScript.

Now that TechM's data layer is solid enough for prototyping purposes (ahem, sans testing...), I'm trying to figure out how the heck I should connect to and develop for the client application. My mind, my Google searches, and my Trello board are all starting to reflect the chaos quoted above, scarily enough. I'm kinda getting lost in the sauce.

The thing is, I want to strike a balance. I want to learn new web technologies this summer, but I also want a working prototype by the end of the week... and between CouchDB + Node.js + Express.js + Mustache + jQuery + Flat UI Kit + ???, none of which I really know, it's looking pretty bad.

What I do know is Ruby on Rails... but I've already developed extensively with it. I also think it's too heavy for what I want to accomplish. But I'm missing the comprehensiveness of Rails, I think. With node.js, every component is customizable and needs to be installed individually via npm, and I haven't yet found a recommended, standard configuration of tools. I haven't found any ridiculously thorough tutorials for node.js yet either (like Michael Hartl's exceptional Rails tutorial).

Friday, June 28, 2013

PrefixSpan to the rescue!

Just when I had buckled down to concoct some crazy algorithm from my own head.

Just when I had written off my internship experience in Kyoto as wholly irrelevant to my life beyond that summer of 2012.

As it turns out, I have already spent extensive time "combining entities" (as I clumsily put it on my Trello board), only back then it was defined in much more elegant and precise terms.

My Problem Space

So since last night, my Python code 1) gathers the trending topics of each major news section on the Google News aggregator, 2) using these topics as queries, executes a search via the GN RSS feed for each query, 3) parses the RSS feed of each topic for all associated article titles, 4) uses named entity recognition (NER) to extract significant words in the article titles, and 5) computes the occurrence frequency of each unique named entity over a single topic.

The problem was that there was a lot of redundancy in each entity list. For example, my sample topic for testing was "Kim Kardashian," and (in honor of Kim and Kanye West's new baby) entities included:
- "North West" (6)
- "North" (3)
- "Baby North West" (2)
- "Baby North" (1)
- "North West Updates" (1)
- "Might Introduce Baby North West" (1)
- "Released Fake Pictures of Baby North West" (1)
- "Kanye West Explains Meaning Behind North" (1)

Wouldn't it be nice if we could get a more pronounced, less scattered sense of the buzz surrounding this new baby girl?

Enter: PrefixSpan

"PreﬁxSpan: Mining Sequential Patterns Efﬁciently by Preﬁx-Projected Pattern Growth"
by Jian Pei, et al. (2001)

I honestly can't remember much about how the PrefixSpan algorithm works, and so I plan on revisiting this paper tonight. But let it be known: this is not a trivial algorithm - to think about or to implement.

Seriously, who knows how much time I might have spent (wasted?) rolling my own algorithm, or searching Google with all the wrong keywords, had I not recognized that I've seen this problem before.

Anyway, the tl;dr for the algorithm is as follows:
... [TODO] ...

Here's a diagram I created for my project's final presentation that represents the fundamental structure of the code:

And here is an illustrated example of how the algorithm works (courtesy of a slide from a previous intern working on the project I temporarily took over):

With a naive, out-of-the-box implementation of PrefixSpan that I found online, the example problem above is condensed to:
- "West" (24) *
- "North" (16)
- "North West" (11)
- "Baby" (6) *
- "Baby North" (5)
- "Baby North West" (4)
- "Baby West" (4) **
* includes other occurrences not listed above
** an unexpected result [UPDATE 7/2/13: modified algorithm to prevent this]

Yes, this list is nearly as long - but it is clearly more focused, and the counts provide sharper demarcations.

Lesson Learned

The value of gathering exposure to all kinds of algorithms.

Thursday, June 27, 2013

I am a robot

...sigh. How does newsmap do it..? Hmm. I don't think it's my use of the RSS feed but rather the rate of my use of the RSS feed today. I was doing a lot of iterating and testing.

UPDATE: yay, the block was released by morning! so...

Lesson Learned

Impose a rate limit on myself before a rate limit is imposed on me.
Iterate and test on local copies.

Trello

I grew fond of Trello for organizing workload and assigning responsibility within my five-person team during CS 210, and so I have returned to it for my current project, despite being a team of one this time around.

New board started today!

Wednesday, June 26, 2013

The first pivot

On May 21, a random Tuesday*, I discovered newsmap, an application created in 2004 that sought a way to map out the news landscape via the Google News aggregator.

It was pretty demoralizing. It even used the term "news landscape" like I did.

Fortunately, my morale was lowered for only about a minute or two (literally, 1-2 minutes. I was actually surprised and proud of how quickly I bounced back. But these self-respect issues are the topic of another medium...).

I realized that I was completely unaware of the comprehensiveness and power of the GN aggregator. The fact that I was planning to aggregate news articles on my own via multiple, independent RSS feeds became laughable - the work was already done for me, by Google, on a level that far surpassed anything my solo development could have produced.

Now that I was more aware of GN, I grew excited about the possibilities for me to explore this treasure trove of news data. Even if I was beat to the punch on creating this news landscape I envisioned (beat by nine years at that), I could still mess around with the data as a neat learning experience.

Furthermore, I found that I had several unexplored ideas (e.g., infinite tweetflight!) that I could still pursue over my summer break. So maybe TechM was concluding only 10 days after the birth of this blog, before even one line of code was written, but I felt that I could easily re-purpose this blog to document my other ideas. This blog would be an authentic record of the life of a creative, curious coder through both the successes and the failures.

...and the pivots. Fast-forward to today. I am now on summer break after graduating with my bachelor's degree on June 16. I followed through and started playing around with GN this past week, and I don't think TechM is dead anymore.

* how can I possibly remember that particular date from a month ago? Kudos to the "revision history" feature in Google docs that remembered when I placed newsmap in my project notes :)

The Pivot

First, let me write about newsmap a little more.

The application is certainly eye-catching and very cool. I appreciate the bold colors and bold fonts and its attempt at directing attention to the most important stories of the moment.

But once you actually start trying to consume the information, newsmap is overwhelming. There is too much data on the screen at once, the use of primary colors strain the eyes, and it is not obvious enough what to look at (and even if you do find some temporary focus, it is quickly drawn away by everything else going on on the page).

I think I can do this better. I think I can find a more simplistic, intuitive design for displaying a news landscape.

First off, my design will not have as much text on the screen at once. Instead of displaying full article titles, my plan has always been to display key entities - persons, organizations, products, events, etc. More information on these entities will be a click away.

When I investigated the Google News site closer, I noticed that Google already has a "trending topics" feature for each major news section. It is just a simple list of topics, but it exists. So I grew discouraged momentarily, thinking that the aggregator already had my idea covered. However, the topics list is just an afterthought for GN. I think there is more potential to be found here. I wish to build something that holds trending topics at its core, at its foundation. At the same time, once again, at some level the work has already been done; at this point in the project, I don't need to roll my own trending topics list.

So here's the pivot: TechM will employ GN's trending topics listing as a baseline. Then, it will go a level deeper and build an index, as first planned, but around the entities associated with each trending topic provided by GN. I like this idea because I trust the data GN delivers, and so directly using what it has deemed a trending topic (as opposed to using what I might have deemed a trending topic) provides a very solid information baseline for my product. Then with this established, I begin the work of replicating this trending topics implementation for the "subsections" (i.e., topics) provided by GN. Oh, and I'm also not limited to building this just for tech blogs (so TechM makes less sense... but it's okay because it's just a project codename).

The pivot could be considered minor to an outsider, but it was instrumental to me and my ability to get past the idea that I was merely being derivative. I feel excited about this project once again.

Now I have some Python code down, things are looking promising, and all is well. I hope to write a little bit about the coding side of things on this blog as well.

Lessons Learned

1) Make full use of the tools, packages, and data forms already available to me. Don't bother reinventing the wheel, unless the current wheel is significantly restricting my progress.

2) Just because something has been done before does not mean I can't do it better. Especially if "it" was last iterated on in 2005, and the brains behind the operation has moved on to bigger and better things like Flipboard (an app I love, by the way).

3) Try to remember that I am (at the present) coding a prototype, a minimum viable product... so don't get too carried away with software architecture

Thursday, May 30, 2013

The CS 210 Experience

CS 210 is...

Software Engineering + Product Management +
Founding a Company + Managing Group Dynamics

CS 210 provides hands-on experience with...

- Knowledge and rational capture
- Market awareness - brainstorming, need finding, and benchmarking
- Rapid prototyping
- Agile development (Scrum framework)
- Readme Driven Development (RDD)
- Distributed source control (Git & Github)
- Issue tracking (Trello)
- Team communication (email & Facebook group)
- Documentation for new employees (Google Drive) + potential customers (blogging)

CS 210 requires from its student teams that they...

- Build team identity (team name, logo, members pic + bio, communication channels, product definition/mission statement)
- Develop and iterate on product ideas aligned with sponsoring company's (SAP) product theme
- Engage in user feedback driven prototyping ("pink bagel" testing)
- Present to company (SAP) liaisons at various product stages (in Palo Alto and on campus)
- Convey how the team would attract desirable new hires ("closing a candidate")
- Formally demo software twice - at the course halfway mark and at the end of the course
- Present to an unaffiliated corporate audience (ShopKick)
- Present to company (SAP) at international headquarters (during spring break in Walldorf, Germany)
- Compose necessary documentation to onboard a new hire efficiently
- Maintain a blog presence throughout product development
- Enter into a contract with company liaisons on what team intends to deliver by course end
- Launch product to a real-world audience

My personal takeaways...

- Just because you like someone doesn't mean you'll like working with them on a long-term basis
- Around The Corner (ATC) would have benefited from more frequent user feedback during development - we developed in a vacuum most of the time
- The ATC team should have elected a media relations manager in its later stages to organize outreach
- User interface design is HUGE
- Having a solid team and product identity promotes team bonding and unity (why t-shirts matter)
- Assigning team roles and responsibilty over particular product aspects is crucial to really getting things done
- Rotating a few key members through the acting role of "CEO" can be effective in managing burnout
(section may be updated in future)

Saturday, May 11, 2013

What is TechM?: Vision

First of all, "TechM" stands for "The Tech Machine." You'll see why if you read the proposal linked to below.

In preparation for presenting TechM to our SAP liaisons (as discussed in "The back story"), I prepared proposal documentation outlining my entire idea:

"The Tech Machine: Proposal" (now outdated)

In this proposal, TechM was originally thought of as a way to make the benchmarking process (as heavily championed by the CS 210 teaching team) more streamlined and effective. It was going to be a product that helps you discover pre-existing products in a space more quickly, an app for discovering the next up-and-coming apps or products or ideas, an app that can organize tech blogs and track up-and-comers based on hype and popularity (i.e., a HypeM for tech blogs). Then the idea was to eventually let users hashtag or link different up-and-comers together to allow for a search tool that lets you navigate pre-existing products or products within a certain space.

The idea has evolved greatly since this proposal.

My teammates pointed out that since HypeM is a web application, perhaps this idea also makes more sense as a web application rather than an Android or iOS app. I agree, especially since I believe that native applications will eventually fall out of favor, and cross-platform mobile applications will soon dominate the market.

An observation that I made is that the idea outlined above has two core features: (1) aggregation and organization of tech blogs, and (2) comprehensive search of the result set from this aggregation and organization. Feature (1) is a mammoth task unto itself, so why not focus on that as the sole core instead?

A third observation comes from my life as a college student these past four years. When I lived at home with my parents and younger sister, the living room television was always on (except if everyone was not home or sleeping), even if no one in particular was watching it. And about 95% of the time, the television was tuned to the news - whether it was the 24-hour news cycle of CNN or MSNBC, CBS Evening News or W-MAR ABC 2 local news, Dateline or Nightline. And occasionally, the television was tuned to more "frivolous" news like SportsCenter or Entertainment Tonight. Bottom line: news was the constant chatter in the background of my pre-college life.

Once I came to college and lost easy television access, that chatter vanished entirely. I had to actively search for all my news for the first time in my life, and I hated having to do it daily. I hated the idea of having to scroll and click through dozens of articles a day, teasing out 10 or so articles that may actually interest me. Even the new mobile news apps popping up everywhere (Pulse, Flipboard) were too verbose and time-consuming most of the time to satisfy my needs. And I didn't like the idea of picking out a few "interests" and "passions" like technology or music to read about - rather, I was more interested in being aware of whatever big things were happening in the world since the last time I checked in. So with no solution in sight, I sank into my Stanford bubble and became grossly unaware of the world at-large. If news reached me, it reached me by mistake - like I happened to see it in my Facebook news feed, or someone mentioned something at a meal. And I was not alone in this behavior among my peers - some of my peers had never been very aware of the news to begin with. Their parents weren't like mine, attached to world affairs.

These three observations made after my initial TechM proposal shaped the idea even further. Now, TechM aims to:

Be an HTML5 web application that is compatible with mobile devices
Aggregate and organize technology-related (to begin with) news stories. But much more than that: extract the entities - persons, organizations, products, events - most vital to each story and build an index that can determine a hype and buzz factor around these entities within any given time frame (the most important time frame being a single day at first).
Build an experience that delivers the entire daily news landscape to users succinctly. This product is not a destination, but rather a pit stop designed to get the user in, out, and on with their day as efficiently as possible.

The back story

February 11, 2013:

First small group meeting (SGM) for my CS 210 team. CS 210 (Software Project Experience with Corporate Partners) is a two quarter-long sequence of classes that fulfills the senior project requirement for CS majors at Stanford. My team was sponsored by SAP, a multinational, multi-billion dollar corporation that provides enterprise software solutions, and we called ourselves the "Socially Awesome Penguins" (or "SAPenguins" for short). Cute, right? I came up with that name.

On February 11, we were just coming away from a pretty disappointing reception by a panel of venture capitalists on our first product idea - mobile-to-mobile screen sharing. The VCs (invited by the excellent Jay Borenstein to provide expert opinions to all the project teams in the class) called our concept uninspired and unambitious. They thought we had a lot more potential to do something great.

So in that SGM, we committed as a team to seeking out fresh, new project ideas to present to our corporate liaisons from SAP on February 20. We ditched the screen sharing idea, and we ditched all the other half-baked ideas that screen sharing had beat out over the last month. After a week-and-a-half of intense brainstorming, featuring a couple long and arduous team brainstorming sessions in Old Union, we settled on presenting our liaisons with two new ideas - code-named "SmartSense" and "TechM."

February 20, 2013:

Long story a little bit shorter, SAP liked the "SmartSense" idea better because it tied in with the project theme they had originally given us - letting the mobile device become a proxy for the identity and/or context of a human user. "SmartSense" gradually evolved into "Around the Corner" and became our final project direction - essentially, a mobile application that allows users to observe and utilize the paths along their daily commutes to find new venues they haven't discovered yet, despite passing by these venues all the time, despite them being "around the corner."

"TechM" did not connect to the SAP project theme, and given that they were our financial backers, it was reasonably tossed out without further thought. But "TechM" was my brainchild from the marathon brainstorming sessions, and I still saw great potential in the idea. This senior project class was not to be the right platform for turning this idea into a product, and so I am free pursue it as an individual.

Read more about the vision behind TechM.