managing very large amount of data

6 messages Options
Embed this post
Permalink
EBIHARA, Yuichiro

managing very large amount of data

Reply Threaded More More options
Print post
Permalink
Hi experts,

I'm now planning a system that will store 15,000,000+ items of
structured data and provide a basic CRUD functionality for them.
While XWiki is the first condidate for me for the time being, I'm
worried about if XWiki can manage such a large amount of data.

Can anyone say the largest XWiki systems?

Also, I'd also like to know what kinds of considerations are needed to
build a large XWiki system.
Which database is recommended? MySQL is ok? Is it a good idea to
manually apply MySQL's partitioned table feature to large tables? And
others...

Any suggestion would be greatly appreciated.

Thanks in advance,

ebi
_______________________________________________
users mailing list
[hidden email]
http://lists.xwiki.org/mailman/listinfo/users
Guillaume Lerouge

Re: managing very large amount of data

Reply Threaded More More options
Print post
Permalink
Hi Ebi,

On Wed, Oct 14, 2009 at 3:57 AM, EBIHARA, Yuichiro <[hidden email]>wrote:

> Hi experts,
>
> I'm now planning a system that will store 15,000,000+ items of
> structured data and provide a basic CRUD functionality for them.
> While XWiki is the first condidate for me for the time being, I'm
> worried about if XWiki can manage such a large amount of data.
>

From a functional point of view XWiki definitely sounds appropriate for this
need. With XObjects + inline edition it's quite easy to create a CRUD-based
system in XWiki.

Out of curiosity, may I ask what are the alternatives you're considering?

Can anyone say the largest XWiki systems?
>

The biggest XWiki systems I'm aware of have around 300,000 documents I
think... I don't know of a deployment that would top 1,000,000 documents
(thought there might be one out there in the wider community).

I guess that theoretically, provided the right infrastructure, there's no
reason why XWiki wouldn't be able to handle this. However since it has never
been done so far (at least not that I know of) it would probably require
some work.

Now since such a big system would require work anyway the question is
whether XWiki is a sound option to consider. I'll let other people answer as
it requires deeper technical knowledge.

I'd be glad to know what you're considering using XWiki for? 15,000,000+
entries sounds like a big system and I'm wondering what it could be :-)

Also, I'd also like to know what kinds of considerations are needed to
> build a large XWiki system.
> Which database is recommended? MySQL is ok? Is it a good idea to
> manually apply MySQL's partitioned table feature to large tables? And
> others...
>
> Any suggestion would be greatly appreciated.
>
> Thanks in advance,
>

Have a nice day,

Guillaume

ebi
> _______________________________________________
> users mailing list
> [hidden email]
> http://lists.xwiki.org/mailman/listinfo/users
>



--
Guillaume Lerouge
Product Manager - XWiki SAS
Skype: wikibc
Twitter: glerouge
http://guillaumelerouge.com/
_______________________________________________
users mailing list
[hidden email]
http://lists.xwiki.org/mailman/listinfo/users
Ludovic Dubost-2

Re: managing very large amount of data

Reply Threaded More More options
Print post
Permalink
In reply to this post by EBIHARA, Yuichiro

Hi Yuichiro

EBIHARA, Yuichiro a écrit :
> Hi experts,
>
> I'm now planning a system that will store 15,000,000+ items of
> structured data and provide a basic CRUD functionality for them.
>  
Nice this sounds like an ambitious project...

Are you talking about 15000000 pages ? Each page would have how many
additional meta data fields ?
> While XWiki is the first condidate for me for the time being, I'm
> worried about if XWiki can manage such a large amount of data.
>
> Can anyone say the largest XWiki systems?
>
>  
Our largest system is currently Curriki.org.. It's more in the 100K of
pages but with large pages (they have many XWiki objects).
There is also a big volume 20GB. We have many installations running with
that size (mostly because of attachments).

> Also, I'd also like to know what kinds of considerations are needed to
> build a large XWiki system.
>  
You absolutely need to implement custom mapping from your structures.
Otherwise it will be too big.

> Which database is recommended? MySQL is ok? Is it a good idea to
>  
This is a good question. I'm not a mysql specialist and 15 million rows
is a lot.
Postgres might be a good candidate too for that volume.
You should search on the Internet.

Depending on your confidentially needs, maybe an XWiki storage
implementation over Google App Engine could be a good candidate.
It also depends on how much time/spending you have for this application.

> manually apply MySQL's partitioned table feature to large tables? And
>  
Anything that is transparent to the application and that allows good
scalability is good

> others...
>
> Any suggestion would be greatly appreciated.
>  
There are some things you need to absolutely avoid like showing in one
page all the data in a space for example or running a search with rights
on a common term. That will clearly not work. and just calling it once
could bring down your install. You might want to write a custom rights
manager to disable rights altogether if it is an option.


I would also suggest that if you have some budget for this project you
look into having some support to do it right. Initial choices could be
tough to fix with that amount of data.
(XWiki SAS provides support http://www.xwiki.com or other experimented
developers on this list might be able to help)

Ludovic
> Thanks in advance,
>
> ebi
> _______________________________________________
> users mailing list
> [hidden email]
> http://lists.xwiki.org/mailman/listinfo/users
>
>  


--
Ludovic Dubost
Blog: http://blog.ludovic.org/
XWiki: http://www.xwiki.com
Skype: ldubost GTalk: ldubost

_______________________________________________
users mailing list
[hidden email]
http://lists.xwiki.org/mailman/listinfo/users
EBIHARA, Yuichiro

Re: managing very large amount of data

Reply Threaded More More options
Print post
Permalink
Guillaume, Ludovic,

Thank you for your quick responses and kind advices.

What I'm planning is a bibliographical database system.
Briefly speaking, the database will finally have 15,000,000+
bibliographical records and 2,000,000+ author records. Both of them
have small number of fields (at most 10?) and one long text field that
users can freely edit.
Since they have N:N relations each other, based on relational model
design, the table to store the relations can also become large.

Currently, my first option is XWiki and another one is to develop
everything from scratch. But I don't like the latter one...
Actually, the amount of data may not be very important because the
estimated number of concurrent users is quite small.

> You absolutely need to implement custom mapping from your structures.
> Otherwise it will be too big.

Great! It looks like the very feature I should use!

Anyway, I still think XWiki is worth studying further more and I'll
start prototyping.

Thanks,

ebi

2009/10/14 Ludovic Dubost <[hidden email]>:

>
> Hi Yuichiro
>
> EBIHARA, Yuichiro a écrit :
>> Hi experts,
>>
>> I'm now planning a system that will store 15,000,000+ items of
>> structured data and provide a basic CRUD functionality for them.
>>
> Nice this sounds like an ambitious project...
>
> Are you talking about 15000000 pages ? Each page would have how many
> additional meta data fields ?
>> While XWiki is the first condidate for me for the time being, I'm
>> worried about if XWiki can manage such a large amount of data.
>>
>> Can anyone say the largest XWiki systems?
>>
>>
> Our largest system is currently Curriki.org.. It's more in the 100K of
> pages but with large pages (they have many XWiki objects).
> There is also a big volume 20GB. We have many installations running with
> that size (mostly because of attachments).
>
>> Also, I'd also like to know what kinds of considerations are needed to
>> build a large XWiki system.
>>
> You absolutely need to implement custom mapping from your structures.
> Otherwise it will be too big.
>
>> Which database is recommended? MySQL is ok? Is it a good idea to
>>
> This is a good question. I'm not a mysql specialist and 15 million rows
> is a lot.
> Postgres might be a good candidate too for that volume.
> You should search on the Internet.
>
> Depending on your confidentially needs, maybe an XWiki storage
> implementation over Google App Engine could be a good candidate.
> It also depends on how much time/spending you have for this application.
>
>> manually apply MySQL's partitioned table feature to large tables? And
>>
> Anything that is transparent to the application and that allows good
> scalability is good
>
>> others...
>>
>> Any suggestion would be greatly appreciated.
>>
> There are some things you need to absolutely avoid like showing in one
> page all the data in a space for example or running a search with rights
> on a common term. That will clearly not work. and just calling it once
> could bring down your install. You might want to write a custom rights
> manager to disable rights altogether if it is an option.
>
>
> I would also suggest that if you have some budget for this project you
> look into having some support to do it right. Initial choices could be
> tough to fix with that amount of data.
> (XWiki SAS provides support http://www.xwiki.com or other experimented
> developers on this list might be able to help)
>
> Ludovic
>> Thanks in advance,
>>
>> ebi
>> _______________________________________________
>> users mailing list
>> [hidden email]
>> http://lists.xwiki.org/mailman/listinfo/users
>>
>>
>
>
> --
> Ludovic Dubost
> Blog: http://blog.ludovic.org/
> XWiki: http://www.xwiki.com
> Skype: ldubost GTalk: ldubost
>
> _______________________________________________
> users mailing list
> [hidden email]
> http://lists.xwiki.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
[hidden email]
http://lists.xwiki.org/mailman/listinfo/users
Sergiu Dumitriu-2

Re: managing very large amount of data

Reply Threaded More More options
Print post
Permalink
On 10/14/2009 01:52 PM, EBIHARA, Yuichiro wrote:

> Guillaume, Ludovic,
>
> Thank you for your quick responses and kind advices.
>
> What I'm planning is a bibliographical database system.
> Briefly speaking, the database will finally have 15,000,000+
> bibliographical records and 2,000,000+ author records. Both of them
> have small number of fields (at most 10?) and one long text field that
> users can freely edit.
> Since they have N:N relations each other, based on relational model
> design, the table to store the relations can also become large.

One possible issue I can see is the fact that each entity uses a 64bit
hash as its identifier, which can cause collisions. Normally that rarely
happens, but as the number of entities increases, so does the chance of
it happening.

This means that some pairs of document names could not be used at the
same time, so you'll have to rename one of them.

--
Sergiu Dumitriu
http://purl.org/net/sergiu/
_______________________________________________
users mailing list
[hidden email]
http://lists.xwiki.org/mailman/listinfo/users
EBIHARA, Yuichiro

Re: managing very large amount of data

Reply Threaded More More options
Print post
Permalink
Sergiu,

Thank you for your comment.

2009/10/17 Sergiu Dumitriu <[hidden email]>:
> One possible issue I can see is the fact that each entity uses a 64bit
> hash as its identifier, which can cause collisions. Normally that rarely
> happens, but as the number of entities increases, so does the chance of
> it happening.
>
> This means that some pairs of document names could not be used at the
> same time, so you'll have to rename one of them.

I'll consider to use auto-number as document name and hide it from user.

Thanks,

ebi
_______________________________________________
users mailing list
[hidden email]
http://lists.xwiki.org/mailman/listinfo/users