24 May'11
Scrapy 0.12 Parsing with python
Based on Scrapy Tutorial (dead link: doc.scrapy.org/intro/tutorial.html)
- Install scrapy and dependencies
sudo apt-get install python-lxml sudo easy_install -U Scrapy
- Create project
scrapy startproject dmoz
- Create item models
from scrapy.item import Item, Field class DmozItem(Item): title = Field() link = Field() desc = Field()
- Create spiders (in projname/spiders/)
from scrapy.spider import BaseSpider from scrapy.selector import HtmlXPathSelector from dmoz.items import DmozItem class DmozSpider(BaseSpider): name = "dmoz.org" allowed_domains = ["dmoz.org"] start_urls = [ "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/", "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/" ] def parse(self, response): hxs …
24 May'11
BeautifulSoup Parsing
1. Import dependencies
import urllib from BeautifulSoup import BeautifulSoup
2. Settings
site = "http://****.in.ua" base = site + "/url/****.html" parse_urls = ["?page=1",] parsed = [] urls = []
3. prepopulate urls bank with paging
def parser(fun): element = parse_urls.pop() parsed.append(element) page = urllib.urlopen(base + element) soup = BeautifulSoup(page.read()) for topic in soup.findAll(True, 'right_block'): urls.append(topic.p.a["href"]) for link in soup.find(id="page_list").findAll('li'): if (link.a["href"] not in parse_urls and link.a["href"] not in parsed): parse_urls.append(link.a["href"]) while(len(parse_urls) != 0): parser(blog_parse)
4. Parse
pages = [] for url …
24 May'11
Installing MongoDB 1.8.1 on Ubuntu 11.04 and PyMongo
Install everything you need:
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10 sudo nano /etc/apt/sources.list
Next, add a line to sources.list:
- on Ubuntu
deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen
- on Debian
deb http://downloads-distro.mongodb.org/repo/debian-sysvinit dist 10gen
sudo apt-get update sudo apt-get install mongodb-10gen sudo apt-get install python-setuptools sudo easy_install pymongo
Next, test the connection:
from pymongo.connection import Connection from pymongo import ASCENDING connection = Connection("localhost", 27017) db = connection.test db.my_collection.save({"x": 10}) db.my_collection.save({"x": 10, "y": "good"}) for item in db.my_collection.find …
05 Apr'11
Ruby on Rails 3 installation on Debian 6 Squeeze
apt-get install libsqlite3-dev curl git build-essential zlib1g-dev libssl-dev bash << ( curl http://rvm.beginrescueend.com/releases/rvm-install-head ) if [[ -s "$HOME/.rvm/scripts/rvm" ]] ; then source "$HOME/.rvm/scripts/rvm" ; fi rvm install 1.9.2 rvm --default ruby-1.9.2 gem install rails rails new testapp
в $HOME/.profile
export PATH=$PATH:/var/lib/gems/1.8/bin
comment out sqlite deps in Gemfile
then run rails server: rails s
UPD 02.03.2012 obsolete replacement:
Package libreadline5-dev is not available, but is referred to by another package. This may mean that the package is missing, has been obsoleted, or is …
02 Apr'11
April Fools Prank on Squid
https://help.ubuntu.com/community/Upside-Down-TernetHowTo
19 Mar'11
SSH passwordless login
On your machine
local $ ssh-keygen -t rsa
do not enter any keys, just hit Enter Enter. Next,
local $ scp ~/.ssh/id_rsa.pub root@ipaddress:/root/.ssh/authorized_keys local $ rm ~/.ssh/id_rsa.pub
local $ ssh root@ipaddress
that should guide you directly to the command prompt
UPD 20.04.2014: There is a great Linux program ssh-copy-id that does exactly everything mentioned above, but in one line instead!
18 Mar'11
Django nginx Debian
- How to make a simple install of django onto small Debian-6 VPS.
- I’ll stick with flup, which enables python to serve fastcgi and some
- other protocols.
- I use it in conjunction with nginx, which in turn is used save memory.
Literature:
- http://library.linode.com/using-linux/administration-basics#system_diagnostics
- http://docs.djangoproject.com/en/dev/howto/deployment/fastcgi/
http://www.mindinmotion.ru/post/django-postgresql-nginx-on-debian-server
1 upgrade the system
apt-get upgrade
2 install required dependencies
apt-get install nginx-light postgresql python-django python-psycopg2 python-flup python-imaging
3 configure nginx
you may want to use emacs, vim, or nano. in case of last - you should …
18 Mar'11
nano configuration
in ~/.nanorc you may add
set tabsize 3 set autoindent
if you want to simplify config files editing
P.S. be careful with python and tabs:
- http://www.secnetix.de/olli/Python/block_indentation.hawk
- http://codeghar.wordpress.com/2008/11/18/python-tabs-or-spaces/
- http://www.cs.caltech.edu/courses/cs11/material/python/misc/python_style_guide.html
- http://www.python.org/dev/peps/pep-0008/
- http://pthree.org/2007/01/31/python-and-the-horrendous-tab-character/
10 Mar'11
How to capitalize a word in C#
This can be easily done via TextInfo class:
name = CultureInfo.CurrentCulture.TextInfo.ToTitleCase(name));
Additionally, LINQ can greatly help iterating over a collection:
lst.ForEach(ci => ci.Name = CultureInfo.CurrentCulture.TextInfo.ToTitleCase(ci.Name));
25 Feb'11
Windows Server 2008 configuration
You may want to disable auto-start of both Initial Configuration and Server manager for some time, and then - to reapply them again. The first one is extremely useful in case of desktop system env.
Open registry at HKLM\Software\Microsoft\ServerManager and change the value of DoNotOpenServerManagerAtLogon key from 0 to 1.
To enable Initial configuration, run oobe.