FLOSS.social @admin

Recent searches

Search options

Only available when logged in.

Thanks to @bleakgrey (and I think I recall someone else being involved), a new Odysseus release is coming out soon support a "reader mode".

I find it rediculous I feel need to support this feature, it's saying "webdevs are doing such a poor job that I need to offer to clear away their mess!"

In celebration of this I will describe how this code (the same as used in Firefox and Pocket) works.

May 20, 2019, 05:09 AM··Tootle

7boosts·7favorites

**alcinnz** @alcinnz · May 20, 2019

May 20, 2019

alcinnz @alcinnz

@bleakgrey When a page loads, this code injects some JavaScript to check if it's probably "readerable".

That isit looks through any visible <p>s (not in an <li>), <pre>s, or <br>-containing <div>s with more than 140 characters, discards some depending on class names, and sums the square roots of any remaining character count.

If so it sends a message to the UI telling it to show the button offering a reader mode.

**alcinnz** @alcinnz · May 20, 2019

May 20, 2019

alcinnz @alcinnz

@bleakgrey When you click that button, there are three layers to the JavaScript ran in the page in order to remove it's junk.

The first layer uses document.write() to drop the existing markup from the page. Then in addition to the page's text, it adds back in it's extracted title and byline. It also computes an estimated reading time @ 200 words-per-minute, before removing any attribute besides "src" and "href" and annotating the page with a theme class.

**alcinnz** @alcinnz · May 20, 2019

May 20, 2019

alcinnz @alcinnz

@bleakgrey The next layer might (if configured to do so) consider giving up if there's too many elements on the page.

Then it removes all <script>, <noscript>, and <style> tags, before replacing "chains" of <br>s with a <p> containing it's subsequent siblings and replaces all <font>s with <span>s.

From there it examines the metatags for useful information (filling in any missing excerpts), and after layer 3 it (unnecessarily here) makes links absolute and removes any classes.

**alcinnz** @alcinnz · May 20, 2019

May 20, 2019

alcinnz @alcinnz

Layer 3 considers elements which:
* is not marked hidden
* doesn't look like a byline/the author name (those are rendered seperately)
* (optionally) based on the absence/presence of certain classes, unless it's in a <table>
* is a <section>, <h2+>, <p>, <td>, or <pre>
* inline-containing <div>s (rewritten to <p>s)
* or <div> containing a single <p> (as on mobile.slate.com)
* and has more than 25 characters

**alcinnz** @alcinnz · May 20, 2019

May 20, 2019

alcinnz @alcinnz

From there it scores each of those elements by number of:
* paragraphs
* commas
* characters by the hundred (up to 3)
These scores also count towards the parent elements, but scaled by depth (especially beyond depth 2)

From there it scales these scores by how much of the text is in links (a likely indicator of navigation) and tracks only the top 5 candidates.

Then it looks to see if it captures more useful text by looking at ancestors and/or siblings.