Page 1 of 2 12 LastLast
Results 1 to 10 of 15
  1. #1
    Master Untangler
    Join Date
    Mar 2009
    Posts
    115

    Default Web filter - miss-categorization... again!

    Fully aware that I'm sounding like a stuck vinyl record but I've got to ask again whether there is a better way to ensure URLs are categorised accurately and remain so?

    ...Having reported directly to Zvelo that plus.google.com was miss-categorised earlier in the year, and it subsequently being re-categorized correctly as "social networking", today I find it's now categorized as "search engines".

    Screen Shot 2017-05-19 at 10.16.31.png

    While this particular re-miss-categorisation may be of no consequence to many, it does show that the use of the built in category lists relies on Zvelo getting the re-categorisation requests right. If they have trouble with differentiating the various services offered by a technology giant such as Google, can we really trust them to get it right with the more obscure stuff out there on the visible web?

  2. #2
    Master Untangler
    Join Date
    Feb 2016
    Posts
    218

    Default

    Quote Originally Posted by sharrisonUK View Post
    If they have trouble with differentiating the various services offered by a technology giant such as Google, can we really trust them to get it right with the more obscure stuff out there on the visible web?
    I understand being troubled by incorrect categorization. I hope no one is actually comfortable with it. For what it's worth, to me, it would be far easier to get confused about the scope of Google's domains than about some obscure site on the web. And having been involved in Google's own problems accurately categorizing sites it crawls, and given the possibly nasty repercussions of the consequent search results, it's not lost on me that even technology giants are still groups of people presumably doing their best.

    That said, it would be nice to have 100% accuracy and consistency. I'm just not sure what better way is being overlooked.

    We can rely on category lists or on content filtering. Those are our basic options. Neither is totally reliable, for different reasons. In the case of malware sites, I'd rather trust a category list than a content filter. So while I'm on board with wishing we had better accuracy, it's not clear to me what better way there might be out there that's being overlooked.

  3. #3
    Master Untangler
    Join Date
    Mar 2009
    Posts
    115

    Default

    It was the re-classification of plus.google.com from a relevant to less relevant category that particularly​ irked me this morning!

    Sent from my Nexus 5X using Tapatalk

  4. #4
    Master Untangler
    Join Date
    Feb 2016
    Posts
    218

    Default

    It would be nice to know the process involved. Did somebody suggest a less relevant category? What triggers a re-categorization? How are suggestions evaluated? Is it an automated process, or have an automated element? Something seems amiss in that process, whatever it is.

  5. #5
    Master Untangler
    Join Date
    Mar 2009
    Posts
    115

    Default

    Re-categorisations are trigged by the likes of you and me on the Zvelo Live page or through the Untangle Web Filter Category Lookup / Recategorise feature...

    This quote from an email exchange back on 21st February 2017 with a Zvelo employee sheds some light on the process. Note that any one URL can have multiple categories which can often lead to block/pass conflicts:

    The re-categorization feature will take care of this because once a URL is reviewed as a result of a re-categorization request, we will assign up to 3 categories based on the content. This is a human categorization review so someone on the Web Analyst team will review and apply the best category or categories.

    For new categories, the system allows for 10 custom categories but that's more for creating whitelists/blacklists and to override existing URLs manually to one of those 10 categories. For a new category that machines can use to categorize new sites requires an machine learning process so there's no way for customers to add completely new categories.

    On the last item of feedback on re-categorization requests. That's not something we don't provide and are not planning on providing.

  6. #6
    Master Untangler
    Join Date
    Mar 2009
    Posts
    115

    Default

    www.magiclinkhandwriting.com - a course (DVD / CD) to teach cursive (joined-up) handwriting - classified by ZVelo as "Illegal Drugs"

    Screen Shot 2017-05-23 at 12.14.43.png

    LOL

    Reported miss-categorisation and eager to check what category gets applied...

  7. #7
    Master Untangler
    Join Date
    Feb 2016
    Posts
    218

    Default

    Seriously…

    So even with the description of the re-categorization process mostly in hand, there are important missing pieces. Judging from the site copyright (2015), either this site was categorized incorrectly from the beginning (whether it was because a new category would need to be created at the time or not shouldn't matter a year or more on) or the re-categorization process isn't straightforward, or subject to manipulation.

    Ultimately a Web filter is a blunt instrument, at least for the foreseeable future. Some end user diligence is going to be involved. But it isn't confidence-building to see this kind of confusion in the general education field. I can see it easier for fields like medicine or medical education/information.

    EDIT: I do wonder how category lists respond to transient stuff, like a site or server hack. An otherwise safe domain might temporarily turn into something else entirely, or be temporarily redirected to something entirely different. Suppose that happens and the category lists respond promptly and appropriately. But then what?
    Last edited by Sam Graf; 05-23-2017 at 06:50 AM. Reason: Added a thought…

  8. #8
    Master Untangler
    Join Date
    Mar 2009
    Posts
    115

    Default

    EDIT: I do wonder how category lists respond to transient stuff, like a site or server hack. An otherwise safe domain might temporarily turn into something else entirely, or be temporarily redirected to something entirely different. Suppose that happens and the category lists respond promptly and appropriately. But then what?
    There are more dynamic approaches to filtering (e.g. Smoothwall use a proprietary engine 'G3' based on their experience of developing and using DansGuardian for around 15 years). It requires more processing power than a block list and those users I know say it has the tendency to over-block. It will likely be better at blocking nastier server / site hacks, but still will require daily management.

  9. #9
    Master Untangler
    Join Date
    Feb 2016
    Posts
    218

    Default

    Quote Originally Posted by sharrisonUK View Post
    There are more dynamic approaches to filtering … using DansGuardian …
    Last time I worked with a product using DansGuardian, that was a content filter. And my pass list was long, though in fairness I was at an organization working in one of those gray areas when it comes to accurately filtering content (sexual health). By default DansGuardian does filter conservatively, though that's adjustable.

    If there is an advantage to a content filter, it's that responsibility and accountability are more local, so control is more local. A category list is "out there" somewhere.

  10. #10
    Master Untangler
    Join Date
    Mar 2009
    Posts
    115

    Default

    A category list is "out there" somewhere.
    Yes, I agree and it depends on the focus of any one provider's category database. I guess we have to accept that Zvelo serve the advertising tech, network security and Mobile service provider sectors; they don't say they are Education or Medical or any number of other market specialists outside of those three... I guess that's where our expertise comes in to monitor and intervene; just a shame we can't easily automate it!

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

SEO by vBSEO 3.6.0 PL2