Clipping to OneNote

汉语使用者可以阅读 我在知乎上的回答,不过这篇博文更加完整。

In this blog entry, I will compare various methods of clipping web pages to OneNote. Too long, didn’t read.

Environment

The experiments are conducted with Microsoft Edge and OneNote for Windows Desktop. Four approaches are tested:

  • OneNote Clipper ‘full page’ mode
  • OneNote Clipper ‘article’ mode
  • Print to OneNote
  • Select all, copy and paste in OneNote

Two web pages are used:

The entry on my blog
The entry on my blog
The article on Medium
The article on Medium

Pros and cons of those approaches are also discussed for other pages I’ve seen.

OneNote Clipper ‘full page’ mode

Screenshots

Preview for blog entry
Preview for blog entry
Result for blog entry
Result for blog entry
Preview for Medium article
Preview for Medium article
Result for Medium article
Result for Medium article

Analysis

Upon invocation, OneNote clipper sends two POST requests to Microsoft, whose bodies are both HTML of the whole document. The request that is required for ‘full page’ mode is sent to https://www.onenote.com/onaugmentation/clipperDomEnhancer/v1.0/. In the response payload is JSON with image (render result) encoded in base 64.

Pros

  • The page is pasted into OneNote as-is (well, most of the time, if extension CSS is not loaded).
  • High availability: available on Edge, Chrome and other platforms.

Cons

  • The links are lost and if you want them, you have to go back to the original page, which might become unavailable later.
  • Page clutters, such as navigation elements and advertisements, are included.
  • The page cannot be rendered if it is served internally (e.g., by a local server). The page cannot be rendered if the stylesheets require credential.
  • The renderer seems to be using awful fonts, especially for Chinese characters.
  • The resulting page in OneNote contains a single huge image and OneNote will do OCR (optical character recognition). This is especially bad since the (alternative) text is already available from the web page. OCR is ‘reinventing the wheel’. Moreover, OneNote is poor at recognising Chinese characters: I’m not blaming the recognition rate, which is actually fine, but the stupid behaviour of inserting spaces between every pair of adjacent Chinese characters! And the recognised text is broken at line-endings, which might not be an issue for Latin texts but is annoying for Chinese texts.
  • The resulting page does not automatically change style for High Contrast themes.

OneNote Clipper ‘article’ mode

Screenshots

Preview for blog entry
Preview for blog entry
Result for blog entry
Result for blog entry
Preview for Medium article
Preview for Medium article
Result for Medium article
Result for Medium article

Analysis

Upon invocation, OneNote clipper sends two POST requests to Microsoft, whose bodies are both HTML of the whole document. The request that is required for ‘mode’ mode is sent to https://www.onenote.com/onaugmentation/clipperextract/v1.0/?renderMethod=extractAggressive&url=<url>&lang=<lang>. In the response payload is JSON with image (render result) encoded in base 64.

Pros

  • The links are kept.
  • Page clutters, such as navigation elements and advertisements, are excluded. But this depends on the extraction algorithm. The algorithm might fail to capture the ‘main’ element if the document is not very ‘accessible’.
  • The clipped page is editing and searching friendly without having to do OCR. Moreover, the font issue is automatically resolved.
  • High availability: available on Edge, Chrome and other platforms.
  • The resulting page automatically adjusts to High Contrast themes.

Cons

  • The resulting page loses almost all formatting.
    • Elements whose content partly comes from pseudo-elements lose those parts. This can be surprising and confusing to users.
    • Hidden elements might be revealed. My blog uses four images for displaying images that change under High Contrast theme. Normally at one time only one of them is shown. However, OneNote Clipper will put all four to OneNote, i.e., quadrupled. Medium manually implements progressive images, and each image is tripled by the current version of OneNote Clipper.
    • The CSS display property different from the default value of that element (tag) is not respected, resulting in unwanted line breaks or line contractions.
  • The resulting page will have to comply with OneNote formatting rules.
    • Each image will take one line, which is especially bad if that image is again multipled. Floating layout is also removed.

Print to OneNote

Screenshots

Preview for blog entry
Preview for blog entry
Result for blog entry
Result for blog entry
Preview for Medium article
Preview for Medium article
Result for Medium article
Result for Medium article

Pros

  • Most of the time, the page is pasted into OneNote as-is. And sometimes, careful designers will have @media print rules that apply to printed documents, which enhances the printing/clipping experience.
    • For example, this blog site will hide navigation elements and comment area when printing. Morever, collapsible areas are always expanded, and collpase/expand UI elements are hidden. Images that have High Contrast alternatives will display its non-High Contrast version. In one word, this site is invariant when printing!
  • The inserted printout will have (alternative) text directly from the printer driver, therefore no OCR is required and Chinese characters aren’t inserted with spaces. The font is exactly the font used by the site.
  • Formatting is kept. Text from pseudo-elements are kept.
  • You can access the printed document (as XPS document) in OneNote.
  • High availability: almost all browsers support printing.

Cons

  • The links are lost and if you want them, you have to go back to the original page, which might become unavailable later.
  • If the site is not well designed, page clutters, such as navigation elements and advertisements, are included. Moreover, the printing experience of the site might be extremely bad. For example, printing produces some empty pages or elements are displaced when printing.
  • Limited availability: you have to use OneNote for Windows Desktop to receive printouts. I’m not sure if one can do this on Mac.
  • The resulting page does not automatically change style for High Contrast themes.

Select all, copy and paste in OneNote

Screenshots

Result for blog entry
Result for blog entry
Result for Medium article
Result for Medium article

Analysis

This is similar to ‘article’ mode.

Pros

  • The links are kept.
  • The clipped page is editing and searching friendly without having to do OCR. Moreover, the font issue is automatically resolved.
  • High availability: any version of OneNote is okay and copy is available on most systems. However, some systems might not be capable to copy HTML and some versions of OneNote is not capable to paste HTML.
  • The resulting page automatically adjusts to High Contrast themes.

Cons

  • Page clutters, such as navigation elements and advertisements, are included.
    • But this also ensures the integrity.
  • The resulting page loses almost all formatting.
    • Elements whose content partly comes from pseudo-elements lose those parts. This can be surprising and confusing to users.
    • Hidden elements might be revealed. My blog uses four images for displaying images that change under High Contrast theme. Normally at one time only one of them is shown. However, OneNote Clipper will put all four to OneNote, i.e., quadrupled. Medium manually implements progressive images, and each image is tripled by the current version of OneNote Clipper.
    • The CSS display property different from the default value of that element (tag) is not respected, resulting in unwanted line breaks or line contractions.
    • My blog site exploits this to remove navigation elements from the pasted note.
  • The resulting page will have to comply with OneNote formatting rules.
    • Each image will take one line, which is especially bad if that image is again multipled. Floating layout is also removed.

Summary

Item \ Method Clip ‘full page’ Clip ‘article’ Print Copy and paste
Good format
except font
✔✔
Good font
adjustable
✔✔
original

adjustable
Clutter-free
depends
OCR-free
Searchable
OCR space issue
✔✔✔ ✔✔
line break issue
✔✔✔
Accessible ✔✔✔ ✔✔ ✔✔✔
Editable
Keep links
Available ✔✔
might fail
✔✔ ✔✔✔

Surely which of the four fits best depends on the scenario, and the specific web page and content one is clipping. However, a priori, I would recommend printing for general purpose clipping, and clipping ‘article’ for articles with few to zero images and (maths) formulae. Current implementation for clipping ‘full page’ is poor. Microsoft should consider rendering the page into a canvas to obtain the image, which will resolve the font issue.

A tip

For ‘full page’ mode or printing, the resulting page consists of one or multiple images. There are scenarios in which one would not like to edit the images directly. For example, the clipped content is an article, a tutorial, a piece of news, etc. In those scenarios, one would like to make side notes or comments to the content. There is a great option for this scenario.

Let’s take printing for example, after the printouts are inserted and loaded, select all and open the context menu, and choose ‘Set Picture as Background’. This way, all pages are set as background, which means they are not easily selected, edited, moved or removed. If you have set the pictures as background, you can easily create an editing box over it. However, the image is invariant under High Contrast themes, so under certain High Contrast themes, the content in an editing box over the image might be hard to see! If you want to make the page more accessible, make notes on sides, not over the image. The following is an example of making notes on this Zhihu answer.

Making notes on a printed pageMaking notes on a printed pageMaking notes on a printed pageMaking notes on a printed page
Making notes on a printed page

As a further tip, one can print a lot of things to OneNote. Part of a PDF document, Word document, and even a OneNote page! If you want to make side notes or comments to another OneNote page without polluting the original page, you can print the page to a new page. This happens if you want to discuss side notes or comments with someone else before commiting those changes to the OneNote page.

Please enable JavaScript to view the comments powered by Disqus.