Search This Blog

Wednesday, December 25, 2013

web screen scraping tools

HtmlUnit is a "GUI-Less browser for Java programs". It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc... just like you do in your "normal" browser.
http://htmlunit.sourceforge.net/


tool  http://screen-scraper.com/

java library list http://www.manageability.org/blog/stuff/screen-scraping-tools-written-in-java/view

JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML.
http://jtidy.sourceforge.net/

jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods.


http://andreas-hess.info/programming/webcrawler/index.html


Monday, December 16, 2013

C++ migrating to Visual Studio 2012/13 from VC6

Read here
Lessons learned migrating to Visual Studio 2012 and .NET 4.5 - CodeProject:

and here - about CDatabase and CRecordset issue:
http://stackoverflow.com/questions/12714310/crecordsetsnapshot-doesnt-work-in-vs2012-anymore-whats-the-alternative
One of the major changes in the ODBC that MS did for MFC is changing the cursor type while opening DB connection.
They change it from SQL_CUR_USE_ODBC to be SQL_CUR_USE_DRIVER


It seems that while accessing DB2 db via MFC/odbc there is no runtime error, when I used  MS default implementation at vs2013(e.g. SQL_CUR_USE_DRIVER), when accessing MSSQL via odbc there are runtime error "feature not implemnted" while trying to update sanpshot CRecordset.

the fix at the moment is as explain in the above link , override the CDatabase OpenEx and use the "old" cursor type SQL_CUR_USE_ODBC.


in case of huge obj files follow this:
Avoid overriding from Template class , its better to use Proxy pattern , and gave it as a data member.



ATL and MFC changes and fixes in Visual Studio 2013