Thursday, March 20, 2014

Elasticsearch

How nice it is to decide that I need to either amateurishly build
or find an implementation of some advanced data structure for TechM
that I would need to then amateurishly build on top of to fully support this project,
a problem which caused forward motion to cease entirely
due to initial dev designing and general confusion and lazy reluctance,
only to serendipitous-ly discover that a comprehensive guide -
on a powerful service using that same advanced data structure at its core (and then some)
that I had discovered a month ago but hadn't figured out what to make of at the time -
has been released today and should guide me on how to solve my problem entirely
(and then some).

How nice - levering technology to solve an existing problem,
rather than some premature optimization.

Friday, January 31, 2014

Female Founders Conference application

Please tell us about something you've achieved that you consider an indicator of your ability.

From January to June 2013, I led client application development on a five-person team. We successfully shipped a version one product online and to the Google Play store.

Our team sought to leverage both geolocation and demonstrated user preference (e.g., past attendances, etc.) to provide students with a more effective mean of finding relevant, on-campus events. Preexisting methods relied heavily on mass blasts, which often proved too impersonal and created an overwhelming flow of promotional content from campus groups and departments. Our solution provided a centralized web portal for promoters to place content, and a mobile app for students to receive content catered to what we gathered about them.

This idea was incubated for my senior project at Stanford and is, by far, my most hands-on experience with entrepreneurship to date. It demanded end-to-end vision - from knowledge and rational capture, market awareness, brainstorming, need finding, and rapid prototyping; to agile development, team communication, and incremental documentation.

However, what impressed me the most about the process of shipping something real was my personal development. My four teammates were smart, highly capable men whom I respect, but I learned how to refrain from allowing this respect to diminish my own perceived value on the team. I learned, despite my natural introversion, how to communicate effectively and with authority on behalf of the team.

I have achieved modestly in the past, but I need to keep striving and learning in order to reach my founder dream.

Friday, December 27, 2013

Wavii

I touched the Github account for this project for the first time in about 5 months, just to begin the process of redoing the user interface. Somewhat simultaneously, I read this blog post from the 'front page' of Hacker News, in which the author compelled entrepreneurs with A BIG IDEA to take a step back and really invest in scoping out the competition:
It generally takes 20-30 hours to dig up all of the players in a space, which seems like a long time to spend looking. But, compared to spending 6-12 months building a product, 20-30 hours now isn't bad at all. Not putting in the hard yards looking is being blissfully ignorant... a thorough market analysis is not to be discouraging: it's a way to become an expert in a space and learn vicariously through competition.
I did not want to be 'blissfully ignorant.' I, too, wanted to 'learn vicariously.' And the more TechM incubated in my mind these last several months (as I waited until I settled into my first job, moved to a new city, bought a new laptop, etc.), the more confident I became in this idea's ability to solve a truly unsolved and important problem. I got legitimately excited, energized.

After tonight, now more than ever, it seems to me that this space is indeed an important problem.

However, it has already been solved, at least in the manner with which TechM aimed (in my wildest, most polished fantasies) to solve it.

So, enter Wavii.

As I quickly texted a friend tonight while trying to stifle my growing disappointment, Wavii is "literally everything techm could have aspired to be, and then some."

The funny thing is, I've known about Wavii for over 7 months now. I've known about Wavii since May 11, 2013, to be exact, when I began my second wave of preliminary market research. It didn't concern me too much back then though. It was certainly the most concerning existing product that I had found, but it still didn't worry me. TechM felt different enough.

But TechM has been evolving in my mind, as evidenced by its Trello board. And nowadays, what I feel would be an interesting, worthwhile problem to solve turns out to have its solution in this startup that was acquired by Google in April 2013. (So close, so far away.)













...

The bright side of all of this is that I am not defeated. I recently read a treatise by Rolf Dobelli proposing that people should "Avoid News: Towards a Healthy News Diet." While I do not agree with everything Dobelli argues for and against in his article, especially the idea that any attempt to keep up with daily news is pointless and futile, it gave me pause about my own goals for TechM.

After reading Dobelli, I tried to shift the target a bit so that TechM could hopefully avoid being irrelevant, disrupting, wasteful, limiting, 'toxic,' etc. (i.e., "TechM is going to be so smart and efficient that it will only deliver choice tidbits at a suitable pace not merely dictated by the 24-hour/daily news cycle!"). I started thinking about Longform, a curated site of 2,000-words-or-more articles that I only recently discovered, and how I could ditch puff pieces altogether in favor of recommending articles of significant length (and, it hopes to follow, significant matter). I started thinking about how I could become more prejudiced with the trending topics displayed- how I could use a topic's historical performance to better denote importance. But this vision majorly butt heads with the 'alternate interface' path I had also been exploring in my mind. So maybe I was making excuses, maybe I was desperately flailing with A BIG IDEA that was slowly sinking underneath Dobelli's newfound ideological heft. And maybe Wavii was the final weight. (Or maybe I'm giving up too soon?)

All of this is to say, if the core idea of TechM is fading as quickly as it appears to be right now, a new incarnation of something may already be on the rise. Now to harness it, define it. Move on with the lessons learned here.

How to: focus on deep instead of broad thinking, insights over factoids, quality over quantity. Continue to combat heavy and time-wasting news consumption, but by adding meaningful words to be read instead of removing meaningless words.

Friday, July 19, 2013

v0.1













Here is a screenshot of the first version of TechM (shown on the 'Technology' section tab). The client has been implemented in Ruby on Rails and HTML/CSS, with intentional avoidance of JS for now (ending my philosophical tech debate temporarily). A few things you can't tell from the screenshot:
  • An asterisk (*) next to an entity indicates that the entity represents an entire cluster of similar entities. Hovering over the entity will display the rest of the cluster in a tooltip.
  • Hovering over any entity will display related article titles, also in the same tooltip (this 'feature' is only for creative purposes - it allows me to visualize the importance of article titles with respect to the context of each entity and trending topic, moving forward. I have no intention of showing article titles in a tooltip in a released version of the product, however.)
The second bullet point above really hints at the major advantage of this UI: allowing me to visualize the data in a more sophisticated way than I could previously. This is a nice step up from staring at the data in JSON format.

This version is not good enough for release for several reasons:
  • Obviously, the color scheme is fairly horrendous. I have yet to settle on a good one.
  • Many of these entities aren't useful without more context. Seeing a pile of entities related to each trending topic doesn't inspire me to continue clicking around and exploring the trending topic. This is a huge problem that I clearly need to improve upon.
Here are some thoughts for improvement:
  • Assign weights to clusters based on # of entities in cluster & entity frequencies and show only information related to weightiest cluster, a combination of entities + related articles
  • Only collect the named entities that occur *directly after* the trending topic in an article title... ('meh' on this idea)
  • Use a POS tagger so that instead of showing entire article titles after showing important entities, just show verb phrases
I'm also going to start using git branch to explore these options for how to display the data.

Overall though, it's exciting to have a working prototype of some sort! Even though it is very rough and needs more work.

Monday, July 15, 2013

Structure of the data layer
















Section, Ttopic, Cluster, Entity, Article - these terms are mirrored as classes in my Python code. The image above is a visual representation the structure of my code at this time.

As one can see, the structure is tree-like, except for two exceptions noted along the right-hand side - trending topics (ttopics) and articles should probably not be restricted to belonging to a single section and entity respectively. I am still working on implementing this behavior, and it is not represented in the image above, but it would mean that leaf nodes on the ttopic and article levels would branch upwards as well.

Below is a quick description of the utility of each class:

Section
pulled directly from Google News, these are the available news sections: Top Stories, World, U.S., Business, Technology, Entertainment, Sports, Health, Science.

Trending topic (Ttopic)
also pulled directly from Google News but dynamically changing, these are the most popular news topics for each section at any given time (similar to Twitter's trending topics which show keywords 'tweeters' are using more or growing fastest)

Article
news articles pulled directly from the Google News RSS feed for each ttopic. instead of associating articles with a ttopic directly, they are first associated with the entities (discussed below) that were extracted from them (this is what causes repetitive use of a single article between entities).

Entity
short for the concept of a 'named entity' from the information extraction world. words from article titles categorized as the names of persons, organizations, and locations with the help of Stanford NER from The Stanford NLP Group (there are other predefined categories that are not currently used in this project). once identified, entities are tallied by frequency per ttopic using PrefixSpan.

Cluster
a wrapper for entities. groups similar entities for a ttopic to mute redundancy in information displayed on future client.

Tuesday, July 2, 2013

Technology soup

From Medium:
In comes node.js, it seemed cool and would achieve what I wanted so I decided to start again...
I decided to have a client-side app and render the views in the browser as it was the “in thing” to do. I used all the goodies available to me such as Backbone, jQuery, Bootstrap and component. This tied to a node server running express.js and mongodb. To top it all off I wanted to be cool so I wrote the whole darn thing in CoffeeScript.
Now that TechM's data layer is solid enough for prototyping purposes (ahem, sans testing...), I'm trying to figure out how the heck I should connect to and develop for the client application. My mind, my Google searches, and my Trello board are all starting to reflect the chaos quoted above, scarily enough. I'm kinda getting lost in the sauce.

The thing is, I want to strike a balance. I want to learn new web technologies this summer, but I also want a working prototype by the end of the week... and between CouchDB + Node.js + Express.js + Mustache + jQuery + Flat UI Kit + ???, none of which I really know, it's looking pretty bad.

What I do know is Ruby on Rails... but I've already developed extensively with it. I also think it's too heavy for what I want to accomplish. But I'm missing the comprehensiveness of Rails, I think. With node.js, every component is customizable and needs to be installed individually via npm, and I haven't yet found a recommended, standard configuration of tools. I haven't found any ridiculously thorough tutorials for node.js yet either (like Michael Hartl's exceptional Rails tutorial).