Behold … “Marginalia” ! My new favorite search engine!! And I have @mattof to thank for this find. Here’s their Blog post about it since I don’t think I could do a better job describing what it is. but, tl;dr: it’s a #smallweb focused search engine.
The web is such garbage these days 😔 Or is it the garbage search engines? 🤔
@Codebuzz@www.codebuzz.nl I have separate mail boxes for private and work, but flattened both to have a simpler structure. For work, where we use Outlook, I am using categories for organising the mails and privately I am using Vivaldi’s labels system. The main idea is to use search and grouping through dynamic saved searches instead of static folders.
So I’ve flattened my work and private email inboxes to single inbox folders and I don’t even know anymore what I was thinking before trying frantically to organise everything in sub folders. Labels and search filters are the way forward.
I share I did write up an algorithm for it at some point I think it is lost in a git comment someplace. I’ll put together a pseudo/go code this week.
Super simple:
Making a reply:
- If yarn has one use that. (Maybe do collision check?)
- Make hash of twt raw no truncation.
- Check local cache for shortest without collision
- in SQL:
select len(subject) where head_full_hash like subject || '%'
- in SQL:
Threading:
- Get full hash of head twt
- Search for twts
- in SQL:
head_full_hash like subject || '%' and created_on > head_timestamp
- in SQL:
The assumption being replies will be for the most recent head. If replying to an older one it will use a longer hash.
Diving into mblaze, I think I’ve nearly* reached peek email geek.
Just a bunch of shell commands I can pipe together to search, list, view and reply to email (after syncing it to a local Maildir).
EXAMPLES at https://git.vuxu.org/mblaze/tree/README
So far I’m using most of the tools directly from the command line, but I might take inspiration from https://sr.ht/~rakoo/omail/ to make my workflow a bit more efficient.
*To get any closer, I think I’d have to hand-craft my own SMTP client or something.
@movq@www.uninformativ.de Yes, the tools are surprisingly fast. Still, magrep takes about 20 seconds to search through my archive of 140K emails, so to speed things up I would probably combine it with an indexer like mu, mairix or notmuch.
So I’m a location based system, how exactly do I reply to one of these two Twts from @Yarns@search.twtxt.net ? 🤔
2024-09-07T12:55:56Z 🥳 NEW FEED: @<twtxt http://edsu.github.io/twtxt/twtxt.txt>
2024-09-07T12:55:56Z 🥳 NEW FEED: @<kdy https://twtxt.kdy.ch/twtxt.txt>
@falsifian@www.falsifian.org comments on the feeds as in nick
, url
, follow
, that kind of thing? If that, then not interested at all. I envision an archive that would allow searching, and potentially browsing threads on a nice, neat interface. You will have to think, though, on other things. Like, what to do with images? Yarn allows users to upload images, but also embed it in twtxts from other sources (hotlinking, actually).
@prologic@twtxt.net I believe you when you say registries as designed today do not crawl. But when I first read the spec, it conjured in my mind a search engine. Now I don’t know how things work out in practice, but just based on reading, I don’t see why it can’t be an API for a crawling search engine. (In fact I don’t see anything in the spec indicating registry servers shouldn’t crawl.)
(I also noticed that https://twtxt.readthedocs.io/en/latest/user/registry.html recommends “The registries should sync each others user list by using the users endpoint”. If I understood that right, registering with one should be enough to appear on others, even if they don’t crawl.)
Does yarnd provide an API for finding twts? Is it similar?
@prologic@twtxt.net I guess I thought they were search engines. Anyway, the registry API looks like a decent one for searching for tweets. Could/should yarn.social pods implement the same API?
@prologic@twtxt.net What’s the difference between search.twtxt.net and the /api/plain/tweets endpoint of a registry? In my mind, a registry is a twtxt search engine. Or are registries not supposed to do their own crawling to discover new feeds?
Never mind, I simply searched and deleted them all (D
then ~f sender
). :-) Phew!
s/(www\.)?youtube.com\/watch?v=([^?]+)/tubeproxy.mills.io/play/\1
for example? 🤔
Have not tried any of them, but some of these seem to fit the bill:
@movq@www.uninformativ.de I’ve been using Qwant for a while but it was down earlier today (as well 😆) so I switched back to my trusty Searx Redirector
… This utility forwards your search query to one of 11 random volunteer-run public servers to thwart mass surveillance.
QOTD: Which web search engine do you use? 😂
Hah 🤣 @dfaria@twtxt.net Your @dfaria.eu@dfaria.eu feed really does consume about >50% of a “Discover” search with filters “Without replies” and “Hide my posts”. 🤣
36/2 = 18
at 25 Twts per page, that’s about ~72% of the search/view real estate you’re taking up! wow 🤩 – I’d be very interested to hear what ideas you have to improve this? Those search filters were created so you could sift through either your own Timeline or the Discover view easily.
Added support for #tag clouds and #search to timeline. Based on code from @dfaria.eu@dfaria.eu🙏
Live at: http://darch.dk/timeline/?profile=https://darch.dk/twtxt.txt
On trouve de ces trucs… Là, plein de livres au format texte brut: https://github.com/ganesh-k13/shell/tree/master/test_search/www.glozman.com/TextPages
Google Chrome Gains AI Features Including a Writing Helper
Google is adding new AI features to Chrome, including tools to organize browser tabs, customize themes, and assist users with writing online content such as reviews and forum posts.
The writing helper is similar to an AI-powered feature already offered in Google’s experimental search experience, SGE, which helps users draft emails in various tones and lengths. W … ⌘ Read more
So, I finally got day 17 to under a second on my machine. (in the test runner it takes 10)
I implemented a Fibonacci Heap to replace the priority queue to great success.
https://git.sour.is/xuu/advent-of-code/src/branch/main/search.go#L168-L268
OH MY FREAKING HECK. So.. I made my pather able to run as Dijkstra or A* if the interface includes a heuristic.. when i tried without the heuristic it finished faster :|
So now to figure out why its not working right.
man… day17 has been a struggle for me.. i have managed to implement A* but the solve still takes about 2 minutes for me.. not sure how some are able to get it under 10 seconds.
Solution: https://git.sour.is/xuu/advent-of-code/src/branch/main/day17/main.go
A* PathFind: https://git.sour.is/xuu/advent-of-code/src/branch/main/search.go
some seem to simplify the seen check to only be horizontal/vertical instead of each direction.. but it doesn’t give me the right answer
The word forms is part two. In this one you want to find the first digit and last digit. Think searching ‘1’ - ‘9’
I could have made my search smarter using a prefix search rather than scanning the full buffer for each iteration.
@prologic@twtxt.net I use the gmail webapp for work, and I have to say that over the years it’s gotten less and less usable. There are so many little usability things that it’s bad at. For instance, if you select a message and hit the Delete key nothing happens. The message is not put in the trash like you’d expect. There are issues like that scattered all over the app. I suspect they spend most of their energy on the spyware side of gmail and dedicate less to making it a useful app for end users (which seems to be true of their search engine too).
@jmjl@tilde.green I’m sorry that I’m not super knowledgeable about alternatives to jmp.chat but I’ll tell you what I know.
You’re probably right about jmp.chat not working for you, at least as it is now. You can only get US and Canadian phone numbers through it last time I checked, so if you’re not in either of those countries you’d be making international calls all the time and people who wanted to call you would be making international calls too.
I’ve seen people talk about using SIP as an intermediary: you can bridge SIP-to-XMPP, and bridge SIP-to-PSTN (PSTN = “packet switched telephone network”, meaning normal telephone). You can skip the SIP-to-XMPP side if you’re comfortable using a SIP client. I don’t know very much about SIP or PSTN so I am not sure what to recommend, but perhaps this helps your search queries.
There are a fair number of services like TextNow that let you sign up for a real telephone number that you can then use via their app (I wouldn’t use TextNow–they had tons of spyware in their app). I don’t know if that kind of service works for you but if it does perhaps you’d be able to find one of them that isn’t horrible. This page (https://alternativeto.net/software/jmp-chat/) has a bunch of alternatives; I can’t vouch for any of them but maybe it’s a starting point if you want to go this route.
Good luck!
@abucci@anthony.buc.ci Are you still with jmp.chat? If so, are you still as happy as you were before? Have you experienced any reliability issues, especially with receiving phone calls?
@marado@twtxt.net It’s very different. Language models are part if traditional search engines and translation engines. The new policy mentions Cloud AI abd Bard specifically. This is a weird change and probably a good preemptive move as I said previously. I’m not sure why you’re downplaying it
An official FBI document dated January 2021, obtained by the American association “Property of People” through the Freedom of Information Act.
This document summarizes the possibilities for legal access to data from nine instant messaging services: iMessage, Line, Signal, Telegram, Threema, Viber, WeChat, WhatsApp and Wickr. For each software, different judicial methods are explored, such as subpoena, search warrant, active collection of communications metadata (“Pen Register”) or connection data retention law (“18 USC§2703”). Here, in essence, is the information the FBI says it can retrieve:
Apple iMessage: basic subscriber data; in the case of an iPhone user, investigators may be able to get their hands on message content if the user uses iCloud to synchronize iMessage messages or to back up data on their phone.
Line: account data (image, username, e-mail address, phone number, Line ID, creation date, usage data, etc.); if the user has not activated end-to-end encryption, investigators can retrieve the texts of exchanges over a seven-day period, but not other data (audio, video, images, location).
Signal: date and time of account creation and date of last connection.
Telegram: IP address and phone number for investigations into confirmed terrorists, otherwise nothing.
Threema: cryptographic fingerprint of phone number and e-mail address, push service tokens if used, public key, account creation date, last connection date.
Viber: account data and IP address used to create the account; investigators can also access message history (date, time, source, destination).
WeChat: basic data such as name, phone number, e-mail and IP address, but only for non-Chinese users.
WhatsApp: the targeted person’s basic data, address book and contacts who have the targeted person in their address book; it is possible to collect message metadata in real time (“Pen Register”); message content can be retrieved via iCloud backups.
Wickr: Date and time of account creation, types of terminal on which the application is installed, date of last connection, number of messages exchanged, external identifiers associated with the account (e-mail addresses, telephone numbers), avatar image, data linked to adding or deleting.
TL;DR Signal is the messaging system that provides the least information to investigators.
I never paid a lot of attention to Ben Shapiro before, but what he says is so transparently asinine it boggles the senses. You really have to have a Fox-addled mind to believe that the search for the submersible was completely faked and that the powers-that-be knew the entire time that it had imploded. To believe that a vast conspiracy among hundreds, thousands (?) of people from several countries and spanning several days was orchestrated to lie to the public in order to…..uh, achieve what exactly? “Undermine institutional credibility”? What does that even mean?
This is “the moon landing was faked” levels of conspiracy theory.
@prologic@twtxt.net The hackathon project that I did recently used openai and embedded the response info into the prompt. So basically i would search for the top 3 most relevant search results to feed into the prompt and the AI would summarize to answer their question.
I have used Linux for most my life, and it hat been my daily driver for nearly two decades now. I have been bugged recently how when I exit
the terminal buffer has not been cleared leaving whatever contents available to the next user to view.
a quick man zsh
I found the STARTUP/SHUTDOWN FILES, and then a quick search on resetting the termianl buffer led me to <esc>c
or printf "\033c"
.
In five minutes something which has bothered me for who knows how long was resolved. Just needed some motivation to figure it out.
I was listening to an O’Reilly hosted event where they had the CEO of GitHub, Thomas Dohmke, talking about CoPilot. I asked about biased systems and copyright problems. He, Thomas Dohmke, said, that in the next iteration they will show name, repo and licence information next to the code snippets you see in CoPilot. This should give a bit more transparency. The developer still has to decide to adhere to the licence. On the other hand, I have to say he is right about the fact, that probably every one of us has used a code snippet from stack overflow (where 99% no licence or copyright is mentioned) or GitHub repos or some tutorial website without mentioning where the code came from. Of course, CoPilot has trained with a lot of code from public repos. It is a more or less a much faster and better search engine that the existing tools have been because how much code has been used from public GitHub repos without adding the source to code you pasted it into?
@prologic@twtxt.net I get the worry of privacy. But I think there is some value in the data being collected. Do I think that Russ is up there scheming new ways to discover what packages you use in internal projects for targeting ads?? Probably not.
Go has always been driven by usage data. Look at modules. There was need for having repeatable builds so various package tool chains were made and evolved into what we have today. Generics took time and seeing pain points where they would provide value. They weren’t done just so it could be checked off on a box of features. Some languages seem to do that to the extreme.
Whenever changes are made to the language there are extensive searches across public modules for where the change might cause issues or could be improved with the change. The fs embed and strings.Cut come to mind.
I think its good that the language maintainers are using what metrics they have to guide where to focus time and energy. Some of the other languages could use it. So time and effort isn’t wasted in maintaining something that has little impact.
The economics of the “spying” are to improve the product and ecosystem. Is it “spying” when a municipality uses water usage metrics in neighborhoods to forecast need of new water projects? Or is it to discover your shower habits for nefarious reasons?
I’ve never liked the idea of having everything displayed all of the time for all of history.
And I still don’t: Search and Bookmarks are better tools for this IMO.
From a technical perspective however, we will not introduce any CGO dependencies into yarnd
– It makes portability harder.
Also I hate SQL 😆
Yes, but no. This didn’t happen before, it will drive me nuts. That search sucks, by the way. I know, I am being gentle. 😂
Tutorial: Getting started with generics - The Go Programming Language – Okay @xuu@txt.sour.is I quite like Go’s generics now 🤣 After going through this myself I like the semantics and the syntax. I’m glad they did a lot of work on this to keep it simple to both understand and use (just like the rest of Go) 👌
#GoLang #GenericsChatGPT is good, but it’s not that good 🤣 I asked it to write a program in Go that performs double ratcheting and well the code is total garbage 😅 – Its only as good as the inputs it was trained on 🤣 #OpenAI #GPT3
Interview with an NFT enthusiast - YouTube – Bahahahahahahaha 🤣 #NFT
how install gomodot? also.. @prologic@twtxt.net your domain has some pretty strong SEO mojo searching for install "gomodot"
puts you on the google first page.
yarns
will get reused directly into yarnd
, except that I'll use the bluge indexer instead.
@prologic@twtxt.net, search for “quark” and you will get quack, quart, quirk, and all possible iterations. Not too helpful.
📣 NEW: Announcing the new and improved Yarns search engine and crawler! search.twtxt.net – Example search for “Hello World” Enjoy! 🤗 – @darch@neotxt.dk When you have this, this is what we need to work on in terms of improving the UI/UX. As a first step you should probably try to apply the same SimpleCSS to this codebase and go from there. – In the end (didn’t happen yet, time/effort) most of the code here in yarns
will get reused directly into yarnd
, except that I’ll use the bluge indexer instead.
Got an acknowledgement of our Salty.im funding proposal to NLnet this evening. I look forward to the outcome 🤞 #Salty.im
Does anyone of you use PGP encrypted mail, or any kind or email encryption? Why? Why not?
Red Line Through HTTPS
⌘ Read more
#Wordle 235 4/6*
⬛🟨🟨⬛⬛
🟨🟨⬛⬛⬛
⬛⬛🟨🟨🟩
🟩🟩🟩🟩🟩
Alien Mission
⌘ Read more
@prologic@twtxt.net let us take the path of less resistance, that is, less effort, for now. I am going to be a great-grandfather before search ever get implemented locally, least one to search on “all pods”. In other words, let us don’t bite more than we can chew. 😹 Neep-gren!