How can I strip HTML from a rich text field?

0
Given that Excel doesn't handle the rich text fields well for data queries, another solution would be to create a plain text field with the HTML stripped.

Does anyone have a way to do that, or code to do it? I can imagine something that would accumulate all the non-HTML characters into a new string and paste that resulting string into a new field. If someone has already done that in appscript, I would love to see it.

No, I can't just copy the rich text field to the plain text field - that carries all the HTML codes and makes them visible.
Responses (6)
  • Accepted Answer

    Jeff Malin
    Jeff Malin
    Offline
    Monday, November 16 2015, 02:06 PM - #Permalink
    0
    If you are open to a client-side solution (which would to set the field's text 'mirror' at the same time the HTML field is entered/edited), a quick google for "javascript jquery strip html" shows a couple of promising suggestions in pure JS or JS+jQuery. This of course doesn't help with any existing fields, but depending on the number already there you could "backfill" by opening/re-saving each one. A Form Action set to run on Form Submission which uses Execute Javascript would do the trick.

    AppScript would be pretty difficult given the lack of support for regular expressions.
    The reply is currently minimized Show
  • Accepted Answer

    Monday, November 16 2015, 02:40 PM - #Permalink
    0
    Yes, I'm discovering the limitations of vbscript 4.0. The quickest solution is string manipulation that inserts a space where every tag was, but it drops line feeds. It's a mess.

    I've got a colleage who is miles better at JavaScript than I am. I'll have him take a look at it...or just persuade the project manager that rich text isn't worth the trouble it causes.
    • Jeff Malin
      more than a month ago
      If your linefeeds are html BR entities, you could do a VBScript Replace call to switch them to vbcrlf characters prior to stripping the rest of the tags.
    • Jessica Weissman
      more than a month ago
      This looks promising.
    The reply is currently minimized Show
  • Accepted Answer

    Wednesday, November 18 2015, 11:33 AM - #Permalink
    0
    Like
    • Jessica Weissman
      more than a month ago
      Interesting. I am not sure that security around our database lets us do such things, but it is worth looking into. Thanks.
    • David Goodale
      more than a month ago
      Nice find. This sets up the ability to pull data into Excel directly via read only stored procedure and / or view.
    The reply is currently minimized Show
  • Accepted Answer

    Wednesday, January 10 2018, 12:31 PM - #Permalink
    0
    ModScript has Regex, which in theory could allow you to remove anything matching <[^>]*> (my best guess for html tags). You could first replace anything matching <\s*br\b[^>]*> (my best guess at a regex for <br/>, <br>, <br >, etc) with a \n.
    Like
    The reply is currently minimized Show
  • Accepted Answer

    Monday, January 07 2019, 10:49 AM - #Permalink
    0
    Has anyone write the Modscript code for this. I am a bit lost as to how to do this.
    The reply is currently minimized Show
  • Accepted Answer

    Tuesday, January 08 2019, 11:02 PM - #Permalink
    0
    The simple answer is that Regular expressions are incapable of dealing with the complexities of HTML. HTML is not "regular". For a more complete answer, see this thread on Stackexchange. The author starts to have fun about a third of the way through, then goes right off the rails, but essentially the answer is: Nope. Can't get there from here.
    The reply is currently minimized Show
Your Reply