Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If only life was that simple that it could be enclosed into series of two digits categories.

The problem with such strongly hierarchical system is that it fails if there is some document, note, picture, etc. that would be useful to keep in multiple locations. Obviously we can introduce links between objects, but I believe tags are more comfortable to use.

Hierarchical system, folders are artifacts of the physical world in which a single object, tool, pipe, screw, book cannot be in two places at the same time. In the abstract world of computers a note about new game could be in #games, #fun, #to-check, #interesting-ideas, #great-graphics, etc.



Personally - I've come to the absolute opposite opinion. To be overly blunt:

"Tags fucking suck."

They are literally the worst possible way to store and organize your information, and they are only useful when you just want a random sampling of a category - not a specific document or piece of information. Ex: Great for social media or looking at old photos or just playing a song from a genre you like, bad (fucking terrible) for organization and structure.

---

Hierarchical structures have downsides, but the exact thing you complain about (artifacts of the physical world) is exactly their strength... You have a body that is adapted to the physical world - routing and navigation through a series of ordered steps is a VERY well developed human skill. We are primed to be able to remember things like:

- Go left at the tree,

- Straight until you hit road

- Right at the road

- continue until you hit a red house with a big garden

- etc...

That skill set maps directly into the hierarchical system of folder:

- Find the "documents" folder on the desktop

- scroll down to "my super sweet project"

- open that folder

- Find the "icons" folder

- open it and double click "exactly_the_thing_you_wanted.jpg"

------

You can absolutely still make horrible, unorganized messes - but if done well (ex: this article is actually a fairly good system) it's a much, much better system than tags.


Your example about navigating roads has nothing to do with hierarchy. And, in fact, most road networks are not hierarchical and the interconnectedness is their strength:

https://en.wikipedia.org/wiki/A_City_Is_Not_a_Tree

Your brain doesn't organize information hierarchically. Let's say I ask you:

1. Name a band that starts with "B".

2. Name a band from England.

3. Name a rock band.

If your brain stored bands in a hierarchy, you'd only be able to come up with "The Beatles" as an answer for one of those questions. You'd have to figure out whether to categorize the Beatles by name, location, or genre and it would be absent from the other categories.


Or you'd have to do an inefficient search in order to find something that matched, which would be slow, but not impossible.

Or you'd have to maintain several redundant hierarchies.

(I agree with you that our subjective experience and speed in thinking of things is evidence that we probably don't mentally represent things this way.)


Strong disagree. https://blog.oup.com/2016/11/evolution-human-memory/ https://www.scientificamerican.com/article/how-gps-weakens-m... Navigation memory is the most core type of memory- most other forms of memory evolved later. There’s a reason why GPS usage is correlated with dementia. Human memory actually evolved out of a sense of navigation.

I strongly agree with the commentator who likened the hierarchal folder structure to the physical world, it’s a much more direct mapping of how human memory actually works.

Humans aren’t actually magical AI computers of energy floating in midair, they’re made of physical meat. Even if some abstract concepts (like tags) may make more theoretical sense (I agree with people who say that certain things can be classified in 2 different locations), it may not play to the actual structure and advantages of the human brain.


> I strongly agree with the commentator who likened the hierarchal folder structure to the physical world, it’s a much more direct mapping of how human memory actually works.

But the physical world isn't hierarchical at all. It's spatial. It's much more like a graph than a tree where there are usually multiple paths between any two points.

If you have to pick up your kid from school and stop at the grocery store for milk on the way home from work, you probably do not:

1. Drive to school and get kid.

2. Drive back to work.

3. Drive to grocery story to get milk.

4. Drive back to work.

5. Drive home from work.

Or:

1. Drive to school and get kid.

2. Drive to grocery story to get milk.

3. Drive back to school.

4. Drive back to work.

5. Drive home from work.

If the physical world was hierarchical, all navigation through multiple waypoints would look like this kind of stack pushing and popping.


I'm telling you that all navigation through multiple waypoints DOES usually look like this kind of pushing and popping (just on a massive scale).

So here's a possible day for me:

I work at corporate office A, it's near the highway entrance. I have to pick up my kid - they are at school down the local street heading west. I travel west and pick up my child.

Now I need milk. The closest grocery is back east, just past my office, so I drive back by my office and pull into the grocery.

Then I load up and set off for home. To get there, I need to take the highway to the north, so I head back past my office on that same street and get on the highway using the closest entrance.

I take the highway until I'm home.

---

That sure seems like a normal day to me. It's exactly what you said folks would never do, but it's super common. And it's hardly something the modern introduced with cars - there's a cost function to travelling anywhere in the world, and people like to connect using low cost paths - which tends to model a folder hierarchy.


Sure, some routes end up being tree-like, because trees are a subset of graphs. But just as often you see waypoints like:

1. Leave the office.

2. Drive to the grocery store.

3. Drive to school.

4. Drive home.

Where there is no backtracking between them.

> And it's hardly something the modern introduced with cars - there's a cost function to travelling anywhere in the world, and people like to connect using low cost paths - which tends to model a folder hierarchy.

A tree doesn't minimize the cost for any given trip or for the aggregate cost of all trips between pairs of points. Because a tree has only a single path between any two points, it has the highest possible aggregate trip cost for all possible trips while still being connected.

What it does minimize is the cost of building and maintaining the paths. Since there is only a single path between any pair of points, it has the fewest redundant edges. If you were tasked with building a road network for a country and your sole goal was to minimize the amount of concrete used, you'd build a tree.

If your only goal was to minimize the aggregate distance all travellers took, you'd build a fully-connected graph where every pair of destinations has a dedicated road.

In practice, road networks are designed to minimize both road maintenance costs and drive time and balance those opposing forces. The result is more connected than a tree but less connected than a complete graph, something like a semilattice.


It seems that navigation memory theory should imply not a hierarchical structure, but a wiki-like structure with many links. In a tree, there’s only one path to a given element, which is not the case in the physical world.


> Navigation memory is the most core type of memory- most other forms of memory evolved later. There’s a reason why GPS usage is correlated with dementia. Human memory actually evolved out of a sense of navigation.

That seems very possible, and probably important, but it's hard for me to relate that to the experience (as an "anatomically modern human") of having other kinds of associative memory that are very effective and don't have a discernible spatial or other hierarchical component.


I agree. But are there any better solutions than manually ln -s? I'm in a band, and also manage booking for a venue. I have $venue/poster/$date\ $bands/$posterfile. I also have $band/poster/$date\ $venue

I don't know of any system that lets a single poster be in multiple places at the same time.


If you want to model this using your filesystem, that's exactly why symlinks (shortcuts on Windows and Mac) were invented.

On Mac, you can write tags on files and then use Spotlight to search for them. Pick one (more or less arbitrary) primary category to use as the directory for the file, then write tags for the other ways you want to be able to search for it.


Tags are superior because tags can model hierarchies, but hierarchies cannot model tags. There are far too many times when a single document crosses multiople categories that are served by tags. I used Outlook for 15+ years and thought tags were a joke, then moved to GSuite for 13 years and learned to use tags, now I"m back on outlook and I feel like I'm suffocating without them. That's two decades of experience with both systems. Not to make a fallacy / whizzing contest out of this, but how long have you tried both systems? I'm guessing not as long.


> Tags are superior because tags can model hierarchies

Tags are inferior because tags must be coerced into hierarchies.

Tags are inferior because they do not properly link hierarchies that they model without extensive software support (which is present for file directories by design, and absent for tags). I have yet to see a hierarchical tagging scheme work well when you need to do something like change a mid-level directory name (you end up having to re-write many tags, often without good software support for what you're trying to do)

Tags themselves are fine. It's a perfectly valid way to label data. It is not a good way to organize that data for human recall and reference.


> It is not a good way to organize that data for human recall and reference.

Yet here I am: using them for recall and reference faster than hierarchies (after 30+ years of using both).


And here I am, using Johnny Decimal for over five years and I can find everything all the time. As Johnny himself said below, if it doesn't work for you - that's cool - use something else. But you assertion that this can't work is not correct. It's just that it can't work for YOU.


Hierarchies are better because they form a natural hypertext.

I'm in my documents folder. I see a list of all the categories of stuff I have. Whatever I'm looking for, it's in one of them. I go into a folder, and I see all the categories in that folder and none of the stuff outside of it. I've narrowed my focus and increased my depth. I can browse.

Sure, tags are more flexible, but (1) I find I almost never actually need them, because in most cases a hierarchy is good enough, and (2) tags don't function as a hypertext and won't let me explore. A big list of tags is much harder to dig through than nested folders.

Granted, it doesn't stop at tags or hierarchy. You can use both—on top of which, there are hierarchical tags, soft links, hard links, and even textual hyperlinks. But out of all of these, I find hierarchy to be the most important one. Given the choice among all of them, I always start with hierarchy and I typically find I don't need anything else.


Pretty sure Categories is what you're talking about for outlook.


I’ve thought a bit about tags++, that is adding some logical and not-so-logical features to them.

For instance there are ideas from OWL where you could define a category instead of other categories and their attributes, for instance tag D could be the union of tag A and tag B and the complement of tag C.

Implication is also useful both as a way to implement subclassing but also containment relationships. For instance on Danbooru a character that has several forms would have the various forms of the character imply that character and the character would imply the media property that the character comes from.

I am looking at what a tagging system looks like in the transformer age and one key idea is a kind of three value logic around tags which can be in a “positive”, “indeterminant” and “negative” state. If you are training a machine learning system to auto tag you will need (1) a number of examples where a tag does not apply (the tag not being applied is not evidence that the tag doesn’t apply, poor coverage of negative examples is one reason why YouTube recommendation is worse than TikTok) and (2) to deal with cases where the ML model tags something incorrectly. If the model tagging something puts it in an indeterminant polarity and that result can later be switched to negative or positive that is a great way to manage the situation.


> ideas from OWL

What is OWL? Except for a good lesson in why not to use common and hence impossible to search for words as names for a project.



They used to call the semantic web that OWL is a part of “Web 3.0” which failed to make an impression or was overwritten with the “Web3” moniker for NFT grifts by exceptionally ignorant people.

I learned OWL the hard way, I had been involved with the semantic web for 10+ years on and off and didn’t meet anyone who knew how to do meaningful modeling with OWL until last year, and that even includes famous academics who”ve written books in it.


OWL and RDF interest me immensely, intellectually. I've never been positioned to use either one professionally, but it looks fascinating. Is there a shorter path to successful modeling than the hard way? Is there a good source on this?


RDF is not magic and OWL is… showing its age.

If you are willing to eat the up-front cost of coordinating global resource identification— a daunting task make no mistake, you get non-trivial dataset integration almost for free. Imagine if concatenating two ginormous JSON documents describing different aspects of the same entity would amount to a useful merge into a single combined JSON. If you Need this with a big N, RDF has no alternative.

The rise of SSDs has also more or less obviated the need for clustered indexes as a practical performance consideration. For the small price of trebling your storage footprint, commodity RDF triplestores will index _all_ your attributes/columns without a schema (usually red/black or equiv). Will it scan an integer PK over 100b records as fast as postgres? No. Is that use case in your hot path? Also no (most likely).

Edit: as for OWL, just take the plunge into rule based inference directly. From forward chaining inference (if you want performance and decidability guarantees) all the way up to full blown prolog or [miniKanRen](http://minikanren.org/) (if you want it in a library in your runtime of choice)


I strongly disagree.

Everywhere where you have a lot of stuff to manage (photos, music, videos, documents, links) hierarchies don't work and only tags can tame all the chaos.

The analogy to "path finding" doesn't hold, imho. That's not how our brains organize information! We organize memories by association and not by some hierarchical structures.


there have been many, MANY historical attempts to organize the worlds knowledge hierarchically. They have all failed to achieve their goals spectacularly.

some of the most common reasons

- things exist in multiple categories that aren't in the same branch of the tree

- different state of mind during data retrieval means you expect the same item to be in different categories.

- different humans think the same thing belongs in different hierarchical locations

there's also been a LOT of scientific research around informational organization. It all came to the same conclusion. Hierarchies have interesting promises but fail when it meets the practical reality of the human brain.

in the end hierarchical organization of knowledge is a terrible solution expect in VERY restricted cases.


Do you have any suggestions of where to start reading on this? A seminal paper or cluster of papers? I want to deep dive on this not just to map out where it doesn't work but also to get a map of the restrictive cases where it does work.

edit: never mind, I just put your quote into gpt-4 and it passed me on to Eleanor Rosch, prototype theory and some other interesting works. I feel like this is my own modern lmgtfy moment.


Tags are great as an adjunct to a thoughtful folder hierarchy, IMHO.

Links are great as part of that too, they can provide shortcuts.

Real-world use: I am an artist, and I have found that the best way to organize my work is with a series of yearly directories. If I begin a large, multi-year project, it goes in a directory within the year I start it; I'll make a link to it that lives next to all the yearly directories.

I also use OSX's tags a ton. Files get marked as 'in progress', 'complete', 'paid for', 'commission', and 'experiment' (and a few other things). When I want to decide what to work on in any particular day it's super easy to open up the saved search for "everything in progress" that I keep on my desktop; this shows me everything in those yearly directories that's marked as 'in progress', whether it's personal work, client work, whether it's part of a large multi-file project with its own folder hierarchy or just a single file in the yearly directory. I also have a saved search for 'commission'+'in progress' for those days when I know I want to work on clearing the commission queue. And whenever I spend some time just fooling around with different effects to create interesting looks, I'll save my scribblings with the 'experiment' tag; when I decide to use it later I can easily tell Illustrator to open a file, and look through the 'experiment' tag to find the file full of some crazy procedural explorations, regardless of how long ago I did it. This habit has saved me hours of digging for that one file where I did that cool trick once.

Trying to organize all the files in my artwork directory with just tags would be a total fucking nightmare, the subdirectory for a multi-year graphic novel has its own folder hierarchy that's several levels deep, and when I know that what I want to work on today is "getting the prepress files together for book 3 of the graphic novel" it's definitely great to be able to just hit the top-level link to the graphic novel directory, then go into "books", then "3", and have its own little file hierarchy in there.

Tags by themselves are not very good for serious organization, but they can be very good for pulling things out of a hierarchical structure. They take work - I have to remember to mark a new file as 'in progress' and possibly a 'commission', though that's become routine, and changing something from 'in progress' to 'complete' is a pleasure. But it's work well worth doing to create a nice little network of shortcuts and secret passages through the terrain of your thoughtfully-laid-out tree of folders.


> You have a body that is adapted to the physical world - routing and navigation through a series of ordered steps is a VERY well developed human skill.

I find that this skill is better utilized with a system that has hyperlinks like Obsidian.

Also purely hierarchical systems break down over time, they can be supported with tags. https://karl-voit.at/2022/01/29/How-to-Use-Tags/

> To my surprise, we tend to think in hierarchical categories all the time. As I have written in my article on Logical Disjunct Categories Don't Work, the real world does not fit into disjunct categories.

> Therefore, we should embrace multi-classification more often. If you do want to learn more about the rationale, you may as well read the first chapters of my PhD thesis or the book "Everything is Miscellaneous" by David Weinberger, just to give you two resources of many.

> Long story short: tagging does take away the burden of finding one single spot in a strict hierarchy of entities which is actually a heavily intertwined network of concepts we do find in the real world. It's far from being a neat hierarchy. Everybody who tries to put "the world" into a strict hierarchy will fail.To my surprise, we tend to think in hierarchical categories all the time. As I have written in my article on Logical Disjunct Categories Don't Work, the real world does not fit into disjunct categories.


The only reason we're even discussing the topic is because search is so poorly implemented in client operating systems. Tags suck, hierarchical structures suck, everything that isn't search sucks. Search still kind of sucks, but it sucks much more because the search available on your own computer for your own files is about thirty years behind the state of the art.


I hope hierarchical aren't disallowed sometime in the future - I could see it happening for phones.


Can't have both ... tags and hierarchal?


Yes you just use a wiki with a traditional tree structure and search. I use Obsidian which lets you do just that.


I've done both as well, tagging everything and then assigning the tags into exclusive hierarchical relationships (for discovery purposes and grouping), but it only works to subdivide within an existing noun like "talent" or "wood panels", without a seed noun tags start becoming too abstract and the object with those tags start to lose all semantic cohesion.

I think once you start talking about unbounded universal tagging with hierarchies, they are not compatible, you need search and weighting or intelligent interfaces.

Search and LLMs really are major organizational improvements in our lifetime imo.


hierarchal tagging is the one true path


I think that's the whole point of this system, when you have infinite tags it's impossible to maintain a correct taxonomy, you add #great-graphics to this game, but now you have to backfill it to all other games, or in the future you may miss them.

They created this so the hierarchy is unambiguous (as much as possible), you want a document, you are two steps away from it in an easy to find way.

tag systems have far too much maintenance and adding a new tag is almost impossible to do exhaustively so you have a lot of partial tags.


> you want a document, you are two steps away from it in an easy to find way

This isn't a response to the parent commenter's point, right? They were describing how many projects have items where a resource easily fits within the scope of N different categories, at which point they become max N steps away from it, not max 2 steps.


Two thoughts:

1. This is much, much less likely with the enforced limits on categorization in the post.

2. No - you are still 2 steps away. Make a choice about where that item lives. If it's shared across many categories, maybe you really need a distinct category like "Ambiguous" or "Shared"


> No - you are still 2 steps away. Make a choice about where that item lives.

You misunderstand. The max N steps are at the point of recall, not categorization decision.


This is a great point about tag maintenance *if you have to make the tags yourself*. However, if you have a simple ML system that you can run to categorize your files and pull out good single word descriptors that have a large explained variance over your files, you can run this and check the tags that are constructed.

I think there's a good way forward that uses typical hierarchical Johnny.Decimal filesystems, with an overlay filesystem with tags that can update the tags every so often based on the content in the files. Obviously letting the user have a hand in this via a TUI/gui would be helpful for choosing tags for which they're comfortable.

Unfortunately I haven't settled on a good filesystem with tags (how to do this with ZFS?) or how to interact with it as a network filesystem served to many different OS (cifs with tags?).


It doesn't seem to me like a simple ML system, it needs to be able to extract tags from all kinds of filetypes (video, games, images, assets, text, ...), at a decent speed and then it has to assign tags to what you would also assign, because if it doesn't do that then it's even worse, because you can never find anything as your mapping and the ML mapping would not be the same.


> However, if you have a simple ML system

Or the old-school method, a community of people with tagging powers and a few moderators to do sanity checks.


#great-graphichs problem is not something category based system will solve either, as you have the same problem. Nothing will, to be honest, maybe AI eventually and even it can't do it in all the things.

> you want a document, you are two steps away from it in an easy to find way

This is not how people work in general. This kind of thing might be OK for institution for taxonomy like collections.


>Hierarchical system, folders are artifacts of the physical world in which a single object, tool, pipe, screw, book cannot be in two places at the same time.

Many think hierarchies come from limits in the physical world but that's not what's happening. Yes, that's some of the cause but does not explain all of it.

The deeper rooted reason is that hierarchies are a convenience to aid the human mind. Even without any limitations of physical shelves, the brain likes to:

- notice the relationships from the general-to-specific and navigate them with spatial cues of dirs parent-->child-->grandchild-->etc

- group related items together -- using spatial cues of moving file icons into a file system folder

The world the the blog essay is working in is the os file system. The various files have to be put somewhere on the file system. Since putting hundreds/thousands of files into a single flat folder is useless, one creates some child subfolders to organize it it in some way.

The tagging system assumes a different mechanism (e.g. a separate "database" of tags which filesystems like Microsoft NTFS and Linux ext4 do not have natively.) This happens above the native filesystem. (Incidentally, by placing a file into a subfolder, the name of that folder and the names of parent folders above it act as an "implied set of tags" for free.)

That said, both hierarchical folders and tags solve different needs. Also, hierarchies simulate/approximate "tags" by "virtual folders" and 1-to-n softlinks. Likewise, tagging can simulate "hierarchies" via compound-multi-word-tags.


Your argument seems to come up a fair amount in these discussions. In the end, you have to deal with storage of many items, and you can either browse or search. The browse approach requires you to know where you'll be browsing in the future. The searching approach requires you to know what to search for. No system is going to deliver all relevant documents, but you can do a good enough job with a hierarchical system plus search.


I think this is exactly right, and it is a facet of the same discoverability issues that crop up when people talk about GUI vs CLI - one is more useful when you're discovering, and one when you are searching. Tags are really set-based search operations like a SQL query, but the 'primary key' is the filename, and if you knew that you'd just search for it. You're rarely going to have a tag or attribute that can pinpoint a single document.


The article points out that it is too easy to create duplicate files. Part of that ties into what you're talking about. Part of that deals with how people deal with files (e.g. few people use versioning outside of software development). The article is suggesting that a strong hierarchical system will help to avoid that problem.

Of course the other problem with tags is management. Placing something into multiple relevant categories involves more effort. Failing to place something into a relevant category makes it harder to find since you are now dealing with either a flat file namespace (worse yet, a disorganized one) or a flat tag namespace. In theory, some of this can be handled by letting someone else handle the tags (e.g. the creator, the publisher, or the seller), but that has its own problems since there is frequently a conflict of interest (e.g. irrelevant tags are applied to increase the visibility of a product).

At the end of the day, we have to accept there is no perfect system of categorization. Some will prefer hierarchies. Some will prefer tags. From the tone of the article, it is clear that they prefer hierarchies.


> At the end of the day, we have to accept there is no perfect system of categorization. Some will prefer hierarchies. Some will prefer tags. From the tone of the article, it is clear that they prefer hierarchies.

I’m the Johnny who wrote Johnny.Decimal and this is basically it.

The OP clearly isn’t one of the people for whom finding JD is a massive mental relief. I know those people exist: they write and tell me.

Others find the idea baffling. Stupid, even. That’s fine. If this helps you, enjoy it. If it doesn’t, use something else.


Thank you. "If this helps you, enjoy it. If it doesn’t, use something else." is a sane, humble, and adult attitude. You have my respect.


I built a hierarchical note-keeping system for myself and have been intending to add tags to it, but I've never gotten around to it -- because the hierarchy is generally "good enough" after I added two features: linking, and grep.

Grep is self-explanatory. Linking works like hard links in Unix, where the same note appears as a child of multiple different parents (added a command to find "orphans" in case you unlink it from everywhere).

At this point I might not even bother adding tags.


> Linking works like hard links in Unix, where the same note appears as a child of multiple different parents (added a command to find "orphans" in case you unlink it from everywhere). > At this point I might not even bother adding tags.

What you described with hard links is exactly how I use tags, so that would satisfy my need for tags as an organizational tool.


While I haven’t gone so far as to attribute a numbering system to my organization, I have done well at organizing things into red-line distinctive categories. The idea is to create categories that _cannot_ overlap. If there’s any commonality between them that’s not useless, they need to be grouped at a higher level.

As an example, if you’re organizing your toolbox, you don’t mark a drawer “hand tools” because it’s a useless categorization. You mark one “socket tools” which will include everything from the sockets and wrenches themselves to adapters that connect a socket to an impact wrench (but an impact wrench does not go in there because it is not exclusively a socket tool). If it really does come down to something that may really fit in two categories (hey, there’s always exceptions), you put your mindset in the place of yourself when you want to look it up: what’s the most common situation in which you’ll be looking this thing up?


This is the crux of it right here. You need to decide up front, thoughtfully and carefully - where you are going to put something. Just like in the physical world. Then you need to adjust and adapt it as you go. All the benefit comes downstream from those small additions to the workflow when you go to save something digitally.


I spent some time studying the world of professional home organization(as seen on Youtube) and the core concepts always come down to these:

* Allocate space up front in the form of containers

* Position containers around workspaces

* Use containers appropriate to the type of object and its use(e.g. "rounds in rounds" - put round bottles on turntable racks so you can spin to access)

* Duplicate objects you need to use in multiple locations, e.g. scissors for the kitchen and for the office

* Label spaces where things belong

And the key thing to it is that this isn't a hard rule like always organizing hierarchically or always labelling. The hierarchy helps compress space(that's why books and folders are powerful) and the labels help define uses, but in many instances, the level of organization you need is an open bin with some dividers - the drawer organizer, cube storage, cardboard box, book bin, cafe tray etc.

Computer file systems are somewhat resistant to unlabelled open-bin storage because that means you're allocating with less precision, but I think everyone in practice knows that they will shove things in "Documents" or "Downloads" and just periodically purge it.


I solved this problem with hard links. I became fan of Hierarchical systems, it just works.


Workflowy [1] solves this problem by supporting mirrored nodes as well as tags.

[1] https://workflowy.com/


Gödel's incompleteness theorems strike again.


But step 2 is to just "Make sure the buckets are unambiguously different."! How hard could that be? \s




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: