Hi, I like to learn about what resources are out there on the internet. I hope you have found my posts useful!

  • 20 Posts
  • 23 Comments
Joined 3 years ago
cake
Cake day: July 1st, 2023

help-circle

  • It’s quite sad that we are now at a point where we are forced to make our own search engines from scratch. Search engines are hard! Google’s original search algorithm (about 2 decades ago) was quite amazing. You were able to give vague search terms and yet still find the answer you wanted. The secret sauce was ranking based on relevance to the search query. I’m not aware of any guides/projects on search engines. I wish there was a good way I could search for this. (The irony!) But a great starting resource is this series on networks from wikipedia. (https://en.wikipedia.org/wiki/Network_theory)

    Some random tips:

    • The main goal of any search engine should be to minimize the number of times a user returns to the ranking page to click on a new link. Big tech should be doing this anyways but they have other goals.
    • The main metadata database needs to topologically connect you to any part of the internet. (https://en.wikipedia.org/wiki/Graph_theory) Think of it as a hub/portal giving you general directions, but doesn’t tell you exactly where you should be heading. The ideal solution is to download everything from the internet and query each result for relevance to a search query individually, but this is intractable. Instead you have to group the internet into graphs and sub graphs - STEM, Social, Forums, E-commerce etc. Hyperlinks offer an objective way to calculate connections between websites. For example Lemmy.world <-> Wikipedia.org. The weight of these connections gives you a way to guide a traversal algorithm during search. Semantic analysis of some form allows you to find more efficient ways to draw connections making your search more efficient.
    • The most powerful way to find connection/relevance to a search term is with transformers and their attention mechanism. For example if the search query is “Open source search engine”, the attention heatmap would be on groups of websites subjects like Forums, Q&A, Programming, Network Science, etc. There would also be a negative heatmap for topics like Cooking, Sports, Entertainment, etc. From there you want to load up recursively metadata for websites. For example for Lemmy it would be the title of all posts (and maybe their top comments). If it fits, load as much of this as you can into a transformer and calculate the heatmap relative to the search query. Again you are not using the transformer to generate answers. This is a bad idea. Instead you are using it to rank search results in terms of relevance/attention, what the transformer is fundamentally designed for.

    As a side note, you are able to tune your model to your own search preferences with little data. You are also able to exchange computation time for search quality! This is amazing. If computation is a concern, traditional traversal algorithms and basic relevance/ranking algorithms work too but at the cost of more engineering.

    I hope this sorta helps, if you have any other question feel free to ask! The future of search will likely be self-hosted as conflicts of interest within current search engine providers degrades the quality to the point where they are unusable.


















  • Please feel free to ask any questions! Math is a wonderful field full of beauty but unfortunately almost all education systems fail to show this and instead makes it seem like raw robotic calculations instead of creativity.

    Math is best learned visually and with context to more abstract terms. 3Blue1Brown is the best resource in my opinion for this!

    Here’s a mindblowing fact for you along with a video from 3Blue1Brown. Imagine you are sliding a 1,000,000 kg box and slamming it into a 1 kg box on an ice surface with no friction. The 1 kg box hits a wall and bounces back to hit the 1,000,000 kg box again.

    The number of bounces that appear is the digits of Pi. Crazy right? Why would pi appear here? If you want to learn more here’s a video from the best math teacher in the world.

    https://www.youtube.com/watch?v=HEfHFsfGXjs














  • Half Life is awesome! If I had to choose from all the games I’ve played, the half life series has to be best in my opinion. There are so many small details and world building elements in each half life game that it really does make it feel like you are in the game. And then there is Half Life: Alyx…

    It was so immersive, I smashed my head against the wall trying to run away from something lol

    Warning Half Life: Alyx Spoilers. Do not read it unless you have played this game. It is a once in a lifetime experience.

    I had that moment in Half Life Alyx: at Jeff. No game makes you feel genuine fear as that level did. You felt as you were there physically trying to escape from death. Best gaming experience ever!









  • Awesome! Thanks for this post by the way! It has been my favorite post to go through since I don’t have a google account and can’t use their recommendation system. Plus, the recommendation system definitely doesn’t give us as high quality channels as some from this thread.

    There are still hundreds if not thousands of hidden gems on YouTube. I’ll come back to this post and share some as I rediscover them.

    Here is another one before I go. Astartes. Sad they stopped but glad they found employment at least for their skills.